ai-tech-rss-fetch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Tech RSS Fetch

AI科技RSS订阅与元数据持久化

Core Goal

核心目标

  • Subscribe to RSS/Atom sources.
  • Persist feed and entry metadata to SQLite.
  • Deduplicate entries with stable keys across runs.
  • Keep only metadata; do not fetch full article bodies and do not summarize.
  • 订阅RSS/Atom源。
  • 将Feed和条目元数据持久化到SQLite。
  • 跨运行周期使用稳定键对条目进行去重。
  • 仅保留元数据;不抓取文章全文,也不生成摘要。

Triggering Conditions

触发条件

  • Receive a request to subscribe RSS feeds from URLs or OPML.
  • Receive a request to run incremental RSS sync reliably.
  • Need stable metadata persistence for downstream processing.
  • Need dedupe-safe storage of feed items over repeated runs.
  • 收到从URL或OPML订阅RSS源的请求。
  • 收到可靠运行RSS增量同步的请求。
  • 需要为下游处理提供稳定的元数据持久化能力。
  • 需要在多次运行中安全存储Feed条目并避免重复。

Workflow

工作流程

  1. Prepare runtime and database.
  • Ensure dependency is installed:
    python3 -m pip install feedparser
    .
  • In multi-agent runtimes, pin DB to an absolute path before any command:
bash
export AI_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/ai_rss.db"
  • Initialize SQLite schema once:
bash
python3 scripts/rss_subscribe.py init-db --db "$AI_RSS_DB_PATH"
  1. Add feed subscriptions.
  • Add one feed URL:
bash
python3 scripts/rss_subscribe.py add-feed --db "$AI_RSS_DB_PATH" --url "https://example.com/feed.xml"
  • Import feeds from OPML:
bash
python3 scripts/rss_subscribe.py import-opml --db "$AI_RSS_DB_PATH" --opml assets/hn-popular-blogs-2025.opml
  1. Run incremental sync.
  • Fetch active feeds and store metadata:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100
  • Optional one-feed sync:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --feed-url "https://example.com/feed.xml"
  1. Query persisted metadata.
  • List feeds:
bash
python3 scripts/rss_subscribe.py list-feeds --db "$AI_RSS_DB_PATH" --limit 50
  • List recent entries:
bash
python3 scripts/rss_subscribe.py list-entries --db "$AI_RSS_DB_PATH" --limit 100
  1. 准备运行环境与数据库。
  • 确保依赖已安装:
    python3 -m pip install feedparser
  • 在多Agent运行环境中,执行任何命令前先将数据库路径设置为绝对路径:
bash
export AI_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/ai_rss.db"
  • 初始化SQLite schema(仅需执行一次):
bash
python3 scripts/rss_subscribe.py init-db --db "$AI_RSS_DB_PATH"
  1. 添加Feed订阅。
  • 添加单个Feed URL:
bash
python3 scripts/rss_subscribe.py add-feed --db "$AI_RSS_DB_PATH" --url "https://example.com/feed.xml"
  • 从OPML导入Feed:
bash
python3 scripts/rss_subscribe.py import-opml --db "$AI_RSS_DB_PATH" --opml assets/hn-popular-blogs-2025.opml
  1. 运行增量同步。
  • 抓取活跃Feed并存储元数据:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100
  • 可选:单个Feed同步:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --feed-url "https://example.com/feed.xml"
  1. 查询持久化的元数据。
  • 列出所有Feed:
bash
python3 scripts/rss_subscribe.py list-feeds --db "$AI_RSS_DB_PATH" --limit 50
  • 列出近期条目:
bash
python3 scripts/rss_subscribe.py list-entries --db "$AI_RSS_DB_PATH" --limit 100

Input Requirements

输入要求

  • Supported inputs:
    • RSS XML feed URLs.
    • OPML feed list files.
  • 支持的输入:
    • RSS XML Feed URL。
    • OPML Feed列表文件。

Output Contract (Metadata Only)

输出约定(仅元数据)

  • Persist
    feeds
    metadata to SQLite:
    • feed_url
      ,
      feed_title
      ,
      site_url
      ,
      etag
      ,
      last_modified
      , status fields.
  • Persist
    entries
    metadata to SQLite:
    • dedupe_key
      ,
      guid
      ,
      url
      ,
      canonical_url
      ,
      title
      ,
      author
      ,
      published_at
      ,
      updated_at
      ,
      summary
      ,
      categories
      , timestamps.
  • Do not store generated summaries and do not create archive markdown files.
  • feeds
    元数据持久化到SQLite:
    • feed_url
      ,
      feed_title
      ,
      site_url
      ,
      etag
      ,
      last_modified
      , 状态字段。
  • entries
    元数据持久化到SQLite:
    • dedupe_key
      ,
      guid
      ,
      url
      ,
      canonical_url
      ,
      title
      ,
      author
      ,
      published_at
      ,
      updated_at
      ,
      summary
      ,
      categories
      , 时间戳。
  • 不存储生成的摘要,也不创建归档Markdown文件。

Configurable Parameters

可配置参数

  • db_path
  • AI_RSS_DB_PATH
    (recommended absolute path in multi-agent runtime)
  • opml_path
  • feed_urls
  • max_feeds_per_run
  • max_items_per_feed
  • user_agent
  • seen_ttl_days
  • enable_conditional_get
  • Example config:
    assets/config.example.json
  • db_path
  • AI_RSS_DB_PATH
    (多Agent运行环境中推荐使用绝对路径)
  • opml_path
  • feed_urls
  • max_feeds_per_run
  • max_items_per_feed
  • user_agent
  • seen_ttl_days
  • enable_conditional_get
  • 示例配置:
    assets/config.example.json

Error and Boundary Handling

错误与边界处理

  • Feed HTTP/network failure: keep syncing other feeds and record
    last_error
    .
  • Feed
    304 Not Modified
    : skip entry parsing and keep state.
  • Missing
    guid
    and
    link
    : use hashed fallback dedupe key.
  • Dependency missing (
    feedparser
    ): return install guidance.
  • Feed HTTP/网络失败:继续同步其他Feed并记录
    last_error
  • Feed返回
    304 Not Modified
    :跳过条目解析并保留当前状态。
  • 缺少
    guid
    link
    :使用哈希值作为备用去重键。
  • 依赖缺失(
    feedparser
    ):返回安装指引。

Final Output Checklist (Required)

最终输出检查清单(必填)

  • core goal
  • trigger conditions
  • input requirements
  • metadata schema
  • dedupe and sync rules
  • command workflow
  • configurable parameters
  • error handling
Use the following simplified checklist verbatim when the user requests it:
text
核心目标
输入需求
触发条件
元数据模型
去重与同步规则
命令流程
可配置参数
错误处理
  • 核心目标
  • 触发条件
  • 输入要求
  • 元数据Schema
  • 去重与同步规则
  • 命令工作流程
  • 可配置参数
  • 错误处理
当用户请求时,直接使用以下简化检查清单:
text
核心目标
输入需求
触发条件
元数据模型
去重与同步规则
命令流程
可配置参数
错误处理

References

参考资料

  • references/input-model.md
  • references/output-rules.md
  • references/time-range-rules.md
  • references/input-model.md
  • references/output-rules.md
  • references/time-range-rules.md

Assets

资源文件

  • assets/hn-popular-blogs-2025.opml
    (candidate feed pool)
  • assets/config.example.json
  • assets/hn-popular-blogs-2025.opml
    (候选Feed池)
  • assets/config.example.json

Scripts

脚本文件

  • scripts/rss_subscribe.py
  • scripts/rss_subscribe.py