daily-news-report
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDaily News Report v3.0
Daily News Report v3.0
架构升级:主 Agent 调度 + SubAgent 执行 + 浏览器抓取 + 智能缓存
Architecture Upgrade: Main Agent (Orchestrator) Scheduling + SubAgent Execution + Browser Fetching + Intelligent Caching
核心架构
Core Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ 主 Agent (Orchestrator) │
│ 职责:调度、监控、评估、决策、汇总 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 1. 初始化 │ → │ 2. 调度 │ → │ 3. 监控 │ → │ 4. 评估 │ │
│ │ 读取配置 │ │ 分发任务 │ │ 收集结果 │ │ 筛选排序 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 5. 决策 │ ← │ 够20条? │ │ 6. 生成 │ → │ 7. 更新 │ │
│ │ 继续/停止 │ │ Y/N │ │ 日报文件 │ │ 缓存统计 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
↓ 调度 ↑ 返回结果
┌─────────────────────────────────────────────────────────────────────┐
│ SubAgent 执行层 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Worker A │ │ Worker B │ │ Browser │ │
│ │ (WebFetch) │ │ (WebFetch) │ │ (Headless) │ │
│ │ Tier1 Batch │ │ Tier2 Batch │ │ JS渲染页面 │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ 结构化结果返回 │ │
│ │ { status, data: [...], errors: [...], metadata: {...} } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────┐
│ Main Agent (Orchestrator) │
│ Responsibilities: Scheduling, Monitoring, Evaluation, Decision-Making, Summarization │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 1. Initialize │ → │ 2. Schedule │ → │ 3. Monitor │ → │ 4. Evaluate │ │
│ │ Read Config │ │ Distribute Tasks │ │ Collect Results │ │ Filter & Sort │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 5. Decide │ ← │ ≥20 Items? │ │ 6. Generate │ → │ 7. Update │ │
│ │ Continue/Stop │ │ Y/N │ │ Daily Report File │ │ Cache Statistics │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
↓ Schedule ↑ Return Results
┌─────────────────────────────────────────────────────────────────────┐
│ SubAgent Execution Layer │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Worker A │ │ Worker B │ │ Browser │ │
│ │ (WebFetch) │ │ (WebFetch) │ │ (Headless) │ │
│ │ Tier1 Batch │ │ Tier2 Batch │ │ JS Rendered Pages │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Return Structured Results │ │
│ │ { status, data: [...], errors: [...], metadata: {...} } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘配置文件
Configuration Files
本 Skill 使用以下配置文件:
| 文件 | 用途 |
|---|---|
| 信息源配置、优先级、抓取方法 |
| 缓存数据、历史统计、去重指纹 |
This Skill uses the following configuration files:
| File | Purpose |
|---|---|
| Information source configuration, priority, fetch method |
| Cache data, historical statistics, deduplication fingerprints |
执行流程详解
Detailed Execution Flow
Phase 1: 初始化
Phase 1: Initialization
yaml
步骤:
1. 确定日期(用户参数或当前日期)
2. 读取 sources.json 获取源配置
3. 读取 cache.json 获取历史数据
4. 创建输出目录 NewsReport/
5. 检查今日是否已有部分报告(追加模式)yaml
Steps:
1. Determine date (user parameter or current date)
2. Read sources.json to get source configurations
3. Read cache.json to get historical data
4. Create output directory NewsReport/
5. Check if partial report exists for today (append mode)Phase 2: 调度 SubAgent
Phase 2: Schedule SubAgents
策略:并行调度,分批执行,早停机制
yaml
第1波 (并行):
- Worker A: Tier1 Batch A (HN, HuggingFace Papers)
- Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)
等待结果 → 评估数量
如果 < 15 条高质量:
第2波 (并行):
- Worker C: Tier2 Batch A (James Clear, FS Blog)
- Worker D: Tier2 Batch B (HackerNoon, Scott Young)
如果仍 < 20 条:
第3波 (浏览器):
- Browser Worker: ProductHunt, Latent Space (需要JS渲染)Strategy: Parallel scheduling, batch execution, early stopping mechanism
yaml
Wave 1 (Parallel):
- Worker A: Tier1 Batch A (HN, HuggingFace Papers)
- Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)
Wait for Results → Evaluate Quantity
If <15 high-quality items:
Wave 2 (Parallel):
- Worker C: Tier2 Batch A (James Clear, FS Blog)
- Worker D: Tier2 Batch B (HackerNoon, Scott Young)
If still <20 items:
Wave 3 (Browser):
- Browser Worker: ProductHunt, Latent Space (require JS rendering)Phase 3: SubAgent 任务格式
Phase 3: SubAgent Task Format
每个 SubAgent 接收的任务格式:
yaml
task: fetch_and_extract
sources:
- id: hn
url: https://news.ycombinator.com
extract: top_10
- id: hf_papers
url: https://huggingface.co/papers
extract: top_voted
output_schema:
items:
- source_id: string # 来源标识
title: string # 标题
summary: string # 2-4句摘要
key_points: string[] # 最多3个要点
url: string # 原文链接
keywords: string[] # 关键词
quality_score: 1-5 # 质量评分
constraints:
filter: "前沿技术/高深技术/提效技术/实用资讯"
exclude: "泛科普/营销软文/过度学术化/招聘帖"
max_items_per_source: 10
skip_on_error: true
return_format: JSONTask format received by each SubAgent:
yaml
task: fetch_and_extract
sources:
- id: hn
url: https://news.ycombinator.com
extract: top_10
- id: hf_papers
url: https://huggingface.co/papers
extract: top_voted
output_schema:
items:
- source_id: string # Source identifier
title: string # Title
summary: string # 2-4 sentence summary
key_points: string[] # Up to 3 key points
url: string # Original link
keywords: string[] # Keywords
quality_score: 1-5 # Quality score
constraints:
filter: "Cutting-edge tech/advanced tech/efficiency tech/practical information"
exclude: "Popular science/marketing articles/overly academic/recruitment posts"
max_items_per_source: 10
skip_on_error: true
return_format: JSONPhase 4: 主 Agent 监控与反馈
Phase 4: Main Agent Monitoring & Feedback
主 Agent 职责:
yaml
监控:
- 检查 SubAgent 返回状态 (success/partial/failed)
- 统计收集到的条目数量
- 记录每个源的成功率
反馈循环:
- 如果某 SubAgent 失败,决定是否重试或跳过
- 如果某源持续失败,标记为禁用
- 动态调整后续批次的源选择
决策:
- 条目数 >= 25 且高质量 >= 20 → 停止抓取
- 条目数 < 15 → 继续下一批
- 所有批次完成但 < 20 → 用现有内容生成(宁缺毋滥)Main Agent Responsibilities:
yaml
Monitoring:
- Check SubAgent return status (success/partial/failed)
- Count collected items
- Record success rate of each source
Feedback Loop:
- If a SubAgent fails, decide whether to retry or skip
- If a source fails continuously, mark as disabled
- Dynamically adjust source selection for subsequent batches
Decision:
- Item count >=25 and high-quality >=20 → Stop fetching
- Item count <15 → Proceed to next batch
- All batches completed but <20 → Generate report with existing content (prioritize quality over quantity)Phase 5: 评估与筛选
Phase 5: Evaluation & Filtering
yaml
去重:
- 基于 URL 完全匹配
- 基于标题相似度 (>80% 视为重复)
- 检查 cache.json 避免与历史重复
评分校准:
- 统一各 SubAgent 的评分标准
- 根据来源可信度调整权重
- 手动标注的高质量源加分
排序:
- 按 quality_score 降序
- 同分按来源优先级排序
- 截取 Top 20yaml
Deduplication:
- Exact URL match
- Title similarity (>80% considered duplicate)
- Check cache.json to avoid duplicates with history
Score Calibration:
- Unify scoring standards across SubAgents
- Adjust weights based on source credibility
- Add points to manually marked high-quality sources
Sorting:
- Descending order by quality_score
- Same score sorted by source priority
- Take Top 20Phase 6: 浏览器抓取 (MCP Chrome DevTools)
Phase 6: Browser Fetching (MCP Chrome DevTools)
对于需要 JS 渲染的页面,使用无头浏览器:
yaml
流程:
1. 调用 mcp__chrome-devtools__new_page 打开页面
2. 调用 mcp__chrome-devtools__wait_for 等待内容加载
3. 调用 mcp__chrome-devtools__take_snapshot 获取页面结构
4. 解析 snapshot 提取所需内容
5. 调用 mcp__chrome-devtools__close_page 关闭页面
适用场景:
- ProductHunt (403 on WebFetch)
- Latent Space (Substack JS 渲染)
- 其他 SPA 应用For pages requiring JS rendering, use headless browser:
yaml
Flow:
1. Call mcp__chrome-devtools__new_page to open page
2. Call mcp__chrome-devtools__wait_for to wait for content loading
3. Call mcp__chrome-devtools__take_snapshot to get page structure
4. Parse snapshot to extract required content
5. Call mcp__chrome-devtools__close_page to close page
Applicable Scenarios:
- ProductHunt (403 on WebFetch)
- Latent Space (Substack JS rendering)
- Other SPA applicationsPhase 7: 生成日报
Phase 7: Generate Daily Report
yaml
输出:
- 目录: NewsReport/
- 文件名: YYYY-MM-DD-news-report.md
- 格式: 标准 Markdown
内容结构:
- 标题 + 日期
- 统计摘要(源数量、收录数量)
- 20条高质量内容(按模板)
- 生成信息(版本、时间戳)yaml
Output:
- Directory: NewsReport/
- Filename: YYYY-MM-DD-news-report.md
- Format: Standard Markdown
Content Structure:
- Title + Date
- Statistical summary (number of sources, number of included items)
- 20 high-quality items (per template)
- Generation info (version, timestamp)Phase 8: 更新缓存
Phase 8: Update Cache
yaml
更新 cache.json:
- last_run: 记录本次运行信息
- source_stats: 更新各源统计数据
- url_cache: 添加已处理的 URL
- content_hashes: 添加内容指纹
- article_history: 记录收录文章yaml
Update cache.json:
- last_run: Record current run information
- source_stats: Update statistics of each source
- url_cache: Add processed URLs
- content_hashes: Add content fingerprints
- article_history: Record included articlesSubAgent 调用示例
SubAgent Call Examples
使用 general-purpose Agent
Use general-purpose Agent
由于自定义 agent 需要 session 重启才能发现,可以使用 general-purpose 并注入 worker prompt:
Task 调用:
subagent_type: general-purpose
model: haiku
prompt: |
你是一个无状态的执行单元。只做被分配的任务,返回结构化 JSON。
任务:抓取以下 URL 并提取内容
URLs:
- https://news.ycombinator.com (提取 Top 10)
- https://huggingface.co/papers (提取高投票论文)
输出格式:
{
"status": "success" | "partial" | "failed",
"data": [
{
"source_id": "hn",
"title": "...",
"summary": "...",
"key_points": ["...", "...", "..."],
"url": "...",
"keywords": ["...", "..."],
"quality_score": 4
}
],
"errors": [],
"metadata": { "processed": 2, "failed": 0 }
}
筛选标准:
- 保留:前沿技术/高深技术/提效技术/实用资讯
- 排除:泛科普/营销软文/过度学术化/招聘帖
直接返回 JSON,不要解释。Since custom agents require session restart to be detected, you can use general-purpose and inject worker prompt:
Task Call:
subagent_type: general-purpose
model: haiku
prompt: |
You are a stateless execution unit. Only perform assigned tasks and return structured JSON.
Task: Fetch the following URLs and extract content
URLs:
- https://news.ycombinator.com (extract Top 10)
- https://huggingface.co/papers (extract top-voted papers)
Output Format:
{
"status": "success" | "partial" | "failed",
"data": [
{
"source_id": "hn",
"title": "...",
"summary": "...",
"key_points": ["...", "...", "..."],
"url": "...",
"keywords": ["...", "..."],
"quality_score": 4
}
],
"errors": [],
"metadata": { "processed": 2, "failed": 0 }
}
Filter Criteria:
- Keep: Cutting-edge tech/advanced tech/efficiency tech/practical information
- Exclude: Popular science/marketing articles/overly academic/recruitment posts
Return JSON directly without explanation.使用 worker Agent(需重启 session)
Use worker Agent (requires session restart)
Task 调用:
subagent_type: worker
prompt: |
task: fetch_and_extract
input:
urls:
- https://news.ycombinator.com
- https://huggingface.co/papers
output_schema:
- source_id: string
- title: string
- summary: string
- key_points: string[]
- url: string
- keywords: string[]
- quality_score: 1-5
constraints:
filter: 前沿技术/高深技术/提效技术/实用资讯
exclude: 泛科普/营销软文/过度学术化Task Call:
subagent_type: worker
prompt: |
task: fetch_and_extract
input:
urls:
- https://news.ycombinator.com
- https://huggingface.co/papers
output_schema:
- source_id: string
- title: string
- summary: string
- key_points: string[]
- url: string
- keywords: string[]
- quality_score: 1-5
constraints:
filter: Cutting-edge tech/advanced tech/efficiency tech/practical information
exclude: Popular science/marketing articles/overly academic输出模板
Output Template
markdown
undefinedmarkdown
undefinedDaily News Report(YYYY-MM-DD)
Daily News Report(YYYY-MM-DD)
本日筛选自 N 个信息源,共收录 20 条高质量内容 生成耗时: X 分钟 | 版本: v3.0Warning: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded. 警告:未检测到 Sub-agent 'worker'。正在以通用模式(串行执行)运行。性能可能会受影响。
Today's content is selected from N information sources, with 20 high-quality items included Generation time: X minutes | Version: v3.0Warning: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded. Warning: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.
1. 标题
1. Title
- 摘要:2-4 行概述
- 要点:
- 要点一
- 要点二
- 要点三
- 来源:链接
- 关键词:
keyword1keyword2keyword3 - 评分:⭐⭐⭐⭐⭐ (5/5)
- Summary: 2-4 line overview
- Key Points:
- Key point 1
- Key point 2
- Key point 3
- Source: Link
- Keywords:
keyword1keyword2keyword3 - Score: ⭐⭐⭐⭐⭐ (5/5)
2. 标题
2. Title
...
Generated by Daily News Report v3.0
Sources: HN, HuggingFace, OneUsefulThing, ...
undefined...
Generated by Daily News Report v3.0
Sources: HN, HuggingFace, OneUsefulThing, ...
undefined约束与原则
Constraints & Principles
- 宁缺毋滥:低质量内容不进入日报
- 早停机制:够 20 条高质量就停止抓取
- 并行优先:同一批次的 SubAgent 并行执行
- 失败容错:单个源失败不影响整体流程
- 缓存复用:避免重复抓取相同内容
- 主 Agent 控制:所有决策由主 Agent 做出
- Fallback Awareness:检测 sub-agent 可用性,不可用时优雅降级
- Prioritize Quality Over Quantity: Low-quality content will not be included in the report
- Early Stopping: Stop fetching once 20 high-quality items are collected
- Parallel First: SubAgents in the same batch execute in parallel
- Failure Tolerance: Single source failure does not affect the overall process
- Cache Reuse: Avoid re-fetching the same content
- Main Agent Control: All decisions are made by the Main Agent
- Fallback Awareness: Detect sub-agent availability and gracefully degrade when unavailable
预期性能
Expected Performance
| 场景 | 预期时间 | 说明 |
|---|---|---|
| 最优情况 | ~2 分钟 | Tier1 足够,无需浏览器 |
| 正常情况 | ~3-4 分钟 | 需要 Tier2 补充 |
| 需要浏览器 | ~5-6 分钟 | 包含 JS 渲染页面 |
| Scenario | Expected Time | Description |
|---|---|---|
| Optimal Case | ~2 minutes | Tier1 sources are sufficient, no browser required |
| Normal Case | ~3-4 minutes | Tier2 sources needed for supplement |
| Browser Required | ~5-6 minutes | Includes JS rendered pages |
错误处理
Error Handling
| 错误类型 | 处理方式 |
|---|---|
| SubAgent 超时 | 记录错误,继续下一个 |
| 源 403/404 | 标记禁用,更新 sources.json |
| 内容提取失败 | 返回原始内容,主 Agent 决定 |
| 浏览器崩溃 | 跳过该源,记录日志 |
| Error Type | Handling Method |
|---|---|
| SubAgent Timeout | Record error, proceed to next one |
| Source 403/404 | Mark as disabled, update sources.json |
| Content Extraction Failure | Return raw content, Main Agent decides |
| Browser Crash | Skip the source, log the error |
兼容性与兜底 (Compatibility & Fallback)
Compatibility & Fallback
为了确保在不同 Agent 环境下的可用性,必须执行以下检查:
-
环境检查:
- 在 Phase 1 初始化阶段,尝试检测 sub-agent 是否存在。
worker - 如果不存在(或未安装相关插件),自动切换到 串行执行模式 (Serial Mode)。
- 在 Phase 1 初始化阶段,尝试检测
-
串行执行模式:
- 不使用 parallel block。
- 主 Agent 依次执行每个源的抓取任务。
- 虽然速度较慢,但保证基本功能可用。
-
用户提示:
- 必须在生成的日报开头(引用块部分)包含明显的警告信息,提示用户当前正在运行于降级模式。
To ensure availability in different Agent environments, the following checks must be performed:
-
Environment Check:
- In Phase 1 initialization, attempt to detect if sub-agent exists.
worker - If not found (or related plugin not installed), automatically switch to Serial Execution Mode.
- In Phase 1 initialization, attempt to detect if
-
Serial Execution Mode:
- Do not use parallel block.
- Main Agent executes fetch tasks for each source sequentially.
- Although slower, basic functionality is guaranteed.
-
User Prompt:
- Must include a clear warning message at the beginning of the generated report (in the blockquote section) to inform users that it is running in degraded mode.