email-imap-full-fetch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Email IMAP Full Fetch

邮件IMAP完整获取

Core Goal

核心目标

  • Fetch one target email by stable message reference from IMAP.
  • Enforce lookup order:
    HEADER Message-Id
    exact match first, then
    uid
    fallback.
  • Download full raw MIME via
    BODY.PEEK[]
    .
  • Parse and return headers, full text body, html body, and attachment metadata.
  • Save
    .eml
    and attachment files to disk with filename safety and idempotent indexing.
  • 通过稳定的邮件引用从IMAP获取目标邮件。
  • 强制查找顺序:优先精确匹配
    HEADER Message-Id
    ,备选使用
    uid
  • 通过
    BODY.PEEK[]
    下载完整的原始MIME内容。
  • 解析并返回邮件标头、纯文本正文、HTML正文以及附件元数据。
  • .eml
    文件和附件保存到磁盘,确保文件名安全且支持幂等索引。

Standard Flow

标准流程

  1. Input must include
    message_id_norm
    from stage-1 routing output (
    mail_ref.message_id_norm
    ).
  2. Use
    fetch --message-id "<message_id_norm or raw Message-Id>"
    as the default path.
  3. Use
    fetch --uid "<uid>"
    only when no usable message-id is available.
  4. Keep mailbox selection consistent with stage-1 (
    --mailbox
    or
    IMAP_MAILBOX
    ).
  5. Read JSON output and continue downstream processing with returned
    mail_ref
    .
  1. 输入必须包含来自第一阶段路由输出的
    message_id_norm
    (即
    mail_ref.message_id_norm
    )。
  2. 默认使用
    fetch --message-id "<message_id_norm或原始Message-Id>"
    作为查找路径。
  3. 仅在没有可用的message-id时,才使用
    fetch --uid "<uid>"
  4. 保持邮箱选择与第一阶段一致(使用
    --mailbox
    或环境变量
    IMAP_MAILBOX
    )。
  5. 读取JSON输出,并使用返回的
    mail_ref
    继续下游处理。

Commands

命令

Fetch by Message-Id (preferred):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>"
Fetch by UID (fallback only):
bash
python3 scripts/imap_full_fetch.py fetch --uid "123456"
Use both when needed (message-id lookup first, uid fallback second):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>" --uid "123456"
按Message-Id获取(优先方式):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>"
按UID获取(仅作为备选):
bash
python3 scripts/imap_full_fetch.py fetch --uid "123456"
必要时同时使用两者(先按message-id查找,备选按uid查找):
bash
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>" --uid "123456"

Output Contract

输出约定

  • Output is a single JSON object.
  • Required top-level fields:
    • mail_ref
    • headers
    • text_plain
    • text_html
    • attachments
    • saved_eml_path
  • mail_ref
    contains:
    • account
      ,
      mailbox
      ,
      uid
      ,
      message_id_raw
      ,
      message_id_norm
      ,
      date
  • attachments[]
    contains per-file metadata and persistence result:
    • filename
      ,
      content_type
      ,
      bytes
      ,
      disposition
      ,
      saved_path
      ,
      skipped_reason
  • 输出为单个JSON对象。
  • 必填顶级字段:
    • mail_ref
    • headers
    • text_plain
    • text_html
    • attachments
    • saved_eml_path
  • mail_ref
    包含:
    • account
      ,
      mailbox
      ,
      uid
      ,
      message_id_raw
      ,
      message_id_norm
      ,
      date
  • attachments[]
    包含每个文件的元数据和持久化结果:
    • filename
      ,
      content_type
      ,
      bytes
      ,
      disposition
      ,
      saved_path
      ,
      skipped_reason

Storage And Idempotency

存储与幂等性

  • saved_eml_path
    points to local
    .eml
    file saved from
    BODY.PEEK[]
    .
  • Attachments are saved without returning attachment binary content in JSON.
  • Filenames are sanitized to remove path separators and unsafe characters.
  • Duplicate attachment names are deduped with content-hash suffix.
  • Repeated requests are idempotent by
    message_id_norm
    index and return existing persisted JSON record directly.
  • saved_eml_path
    指向从
    BODY.PEEK[]
    保存的本地
    .eml
    文件。
  • 附件将被保存,但不会在JSON中返回附件的二进制内容。
  • 文件名会被清理,移除路径分隔符和不安全字符。
  • 重复的附件名将通过内容哈希后缀进行去重。
  • 基于
    message_id_norm
    索引,重复请求具有幂等性,会直接返回已持久化的现有JSON记录。

Parameters

参数

  • --message-id
    : primary lookup key.
  • --uid
    : fallback lookup key.
  • --mailbox
    : mailbox to query (default
    IMAP_MAILBOX
    or
    INBOX
    ).
  • --save-eml-dir
    : target dir for
    .eml
    files (env
    IMAP_FULL_SAVE_EML_DIR
    ).
  • --index-dir
    : target dir for idempotency index JSON files (env
    IMAP_FULL_INDEX_DIR
    , default
    <save-eml-dir>/.index
    ).
  • --save-attachments-dir
    : target dir for attachments (env
    IMAP_FULL_SAVE_ATTACHMENTS_DIR
    ).
  • --max-attachment-bytes
    : max saved attachment size (env
    IMAP_FULL_MAX_ATTACHMENT_BYTES
    ).
  • --allow-ext
    : allowed attachment extensions, comma-separated (env
    IMAP_FULL_ALLOW_EXT
    ).
  • --connect-timeout
    : IMAP connect timeout seconds (default from
    IMAP_CONNECT_TIMEOUT
    ).
  • --message-id
    :主要查找键。
  • --uid
    :备选查找键。
  • --mailbox
    :要查询的邮箱(默认使用环境变量
    IMAP_MAILBOX
    INBOX
    )。
  • --save-eml-dir
    .eml
    文件的目标存储目录(环境变量
    IMAP_FULL_SAVE_EML_DIR
    )。
  • --index-dir
    :幂等性索引JSON文件的目标目录(环境变量
    IMAP_FULL_INDEX_DIR
    ,默认值为
    <save-eml-dir>/.index
    )。
  • --save-attachments-dir
    :附件的目标存储目录(环境变量
    IMAP_FULL_SAVE_ATTACHMENTS_DIR
    )。
  • --max-attachment-bytes
    :允许保存的附件最大大小(环境变量
    IMAP_FULL_MAX_ATTACHMENT_BYTES
    )。
  • --allow-ext
    :允许的附件扩展名,以逗号分隔(环境变量
    IMAP_FULL_ALLOW_EXT
    )。
  • --connect-timeout
    :IMAP连接超时时间(秒,默认值来自环境变量
    IMAP_CONNECT_TIMEOUT
    )。

Required Environment

必要环境变量

  • IMAP_HOST
  • IMAP_USERNAME
  • IMAP_PASSWORD
Optional account defaults:
  • IMAP_NAME
  • IMAP_PORT
  • IMAP_SSL
  • IMAP_MAILBOX
  • IMAP_HOST
  • IMAP_USERNAME
  • IMAP_PASSWORD
可选的账户默认值:
  • IMAP_NAME
  • IMAP_PORT
  • IMAP_SSL
  • IMAP_MAILBOX

Scripts

脚本

  • scripts/imap_full_fetch.py
  • scripts/imap_full_fetch.py