ai-tech-rss-fetch
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Tech RSS Fetch
AI科技RSS订阅与元数据持久化
Core Goal
核心目标
- Subscribe to RSS/Atom sources.
- Persist feed and entry metadata to SQLite.
- Deduplicate entries with stable keys across runs.
- Keep only metadata; do not fetch full article bodies and do not summarize.
- 订阅RSS/Atom源。
- 将Feed和条目元数据持久化到SQLite。
- 跨运行周期使用稳定键对条目进行去重。
- 仅保留元数据;不抓取文章全文,也不生成摘要。
Triggering Conditions
触发条件
- Receive a request to subscribe RSS feeds from URLs or OPML.
- Receive a request to run incremental RSS sync reliably.
- Need stable metadata persistence for downstream processing.
- Need dedupe-safe storage of feed items over repeated runs.
- 收到从URL或OPML订阅RSS源的请求。
- 收到可靠运行RSS增量同步的请求。
- 需要为下游处理提供稳定的元数据持久化能力。
- 需要在多次运行中安全存储Feed条目并避免重复。
Workflow
工作流程
- Prepare runtime and database.
- Ensure dependency is installed: .
python3 -m pip install feedparser - In multi-agent runtimes, pin DB to an absolute path before any command:
bash
export AI_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/ai_rss.db"- Initialize SQLite schema once:
bash
python3 scripts/rss_subscribe.py init-db --db "$AI_RSS_DB_PATH"- Add feed subscriptions.
- Add one feed URL:
bash
python3 scripts/rss_subscribe.py add-feed --db "$AI_RSS_DB_PATH" --url "https://example.com/feed.xml"- Import feeds from OPML:
bash
python3 scripts/rss_subscribe.py import-opml --db "$AI_RSS_DB_PATH" --opml assets/hn-popular-blogs-2025.opml- Run incremental sync.
- Fetch active feeds and store metadata:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100- Optional one-feed sync:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --feed-url "https://example.com/feed.xml"- Query persisted metadata.
- List feeds:
bash
python3 scripts/rss_subscribe.py list-feeds --db "$AI_RSS_DB_PATH" --limit 50- List recent entries:
bash
python3 scripts/rss_subscribe.py list-entries --db "$AI_RSS_DB_PATH" --limit 100- 准备运行环境与数据库。
- 确保依赖已安装:。
python3 -m pip install feedparser - 在多Agent运行环境中,执行任何命令前先将数据库路径设置为绝对路径:
bash
export AI_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/ai_rss.db"- 初始化SQLite schema(仅需执行一次):
bash
python3 scripts/rss_subscribe.py init-db --db "$AI_RSS_DB_PATH"- 添加Feed订阅。
- 添加单个Feed URL:
bash
python3 scripts/rss_subscribe.py add-feed --db "$AI_RSS_DB_PATH" --url "https://example.com/feed.xml"- 从OPML导入Feed:
bash
python3 scripts/rss_subscribe.py import-opml --db "$AI_RSS_DB_PATH" --opml assets/hn-popular-blogs-2025.opml- 运行增量同步。
- 抓取活跃Feed并存储元数据:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100- 可选:单个Feed同步:
bash
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --feed-url "https://example.com/feed.xml"- 查询持久化的元数据。
- 列出所有Feed:
bash
python3 scripts/rss_subscribe.py list-feeds --db "$AI_RSS_DB_PATH" --limit 50- 列出近期条目:
bash
python3 scripts/rss_subscribe.py list-entries --db "$AI_RSS_DB_PATH" --limit 100Input Requirements
输入要求
- Supported inputs:
- RSS XML feed URLs.
- OPML feed list files.
- 支持的输入:
- RSS XML Feed URL。
- OPML Feed列表文件。
Output Contract (Metadata Only)
输出约定(仅元数据)
- Persist metadata to SQLite:
feeds- ,
feed_url,feed_title,site_url,etag, status fields.last_modified
- Persist metadata to SQLite:
entries- ,
dedupe_key,guid,url,canonical_url,title,author,published_at,updated_at,summary, timestamps.categories
- Do not store generated summaries and do not create archive markdown files.
- 将元数据持久化到SQLite:
feeds- ,
feed_url,feed_title,site_url,etag, 状态字段。last_modified
- 将元数据持久化到SQLite:
entries- ,
dedupe_key,guid,url,canonical_url,title,author,published_at,updated_at,summary, 时间戳。categories
- 不存储生成的摘要,也不创建归档Markdown文件。
Configurable Parameters
可配置参数
db_path- (recommended absolute path in multi-agent runtime)
AI_RSS_DB_PATH opml_pathfeed_urlsmax_feeds_per_runmax_items_per_feeduser_agentseen_ttl_daysenable_conditional_get- Example config:
assets/config.example.json
db_path- (多Agent运行环境中推荐使用绝对路径)
AI_RSS_DB_PATH opml_pathfeed_urlsmax_feeds_per_runmax_items_per_feeduser_agentseen_ttl_daysenable_conditional_get- 示例配置:
assets/config.example.json
Error and Boundary Handling
错误与边界处理
- Feed HTTP/network failure: keep syncing other feeds and record .
last_error - Feed : skip entry parsing and keep state.
304 Not Modified - Missing and
guid: use hashed fallback dedupe key.link - Dependency missing (): return install guidance.
feedparser
- Feed HTTP/网络失败:继续同步其他Feed并记录。
last_error - Feed返回:跳过条目解析并保留当前状态。
304 Not Modified - 缺少和
guid:使用哈希值作为备用去重键。link - 依赖缺失():返回安装指引。
feedparser
Final Output Checklist (Required)
最终输出检查清单(必填)
- core goal
- trigger conditions
- input requirements
- metadata schema
- dedupe and sync rules
- command workflow
- configurable parameters
- error handling
Use the following simplified checklist verbatim when the user requests it:
text
核心目标
输入需求
触发条件
元数据模型
去重与同步规则
命令流程
可配置参数
错误处理- 核心目标
- 触发条件
- 输入要求
- 元数据Schema
- 去重与同步规则
- 命令工作流程
- 可配置参数
- 错误处理
当用户请求时,直接使用以下简化检查清单:
text
核心目标
输入需求
触发条件
元数据模型
去重与同步规则
命令流程
可配置参数
错误处理References
参考资料
references/input-model.mdreferences/output-rules.mdreferences/time-range-rules.md
references/input-model.mdreferences/output-rules.mdreferences/time-range-rules.md
Assets
资源文件
- (candidate feed pool)
assets/hn-popular-blogs-2025.opml assets/config.example.json
- (候选Feed池)
assets/hn-popular-blogs-2025.opml assets/config.example.json
Scripts
脚本文件
scripts/rss_subscribe.py
scripts/rss_subscribe.py