Loading...
Loading...
Fetch content based on a preset URL list, filter high-quality technical information and generate daily Markdown reports.
npx skill4agent add rookie-ricardo/erduo-skills daily-news-reportArchitecture Upgrade: Main Agent (Orchestrator) Scheduling + SubAgent Execution + Browser Fetching + Intelligent Caching
┌─────────────────────────────────────────────────────────────────────┐
│ Main Agent (Orchestrator) │
│ Responsibilities: Scheduling, Monitoring, Evaluation, Decision-Making, Summarization │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 1. Initialize │ → │ 2. Schedule │ → │ 3. Monitor │ → │ 4. Evaluate │ │
│ │ Read Config │ │ Distribute Tasks │ │ Collect Results │ │ Filter & Sort │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 5. Decide │ ← │ ≥20 Items? │ │ 6. Generate │ → │ 7. Update │ │
│ │ Continue/Stop │ │ Y/N │ │ Daily Report File │ │ Cache Statistics │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
↓ Schedule ↑ Return Results
┌─────────────────────────────────────────────────────────────────────┐
│ SubAgent Execution Layer │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Worker A │ │ Worker B │ │ Browser │ │
│ │ (WebFetch) │ │ (WebFetch) │ │ (Headless) │ │
│ │ Tier1 Batch │ │ Tier2 Batch │ │ JS Rendered Pages │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Return Structured Results │ │
│ │ { status, data: [...], errors: [...], metadata: {...} } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘| File | Purpose |
|---|---|
| Information source configuration, priority, fetch method |
| Cache data, historical statistics, deduplication fingerprints |
Steps:
1. Determine date (user parameter or current date)
2. Read sources.json to get source configurations
3. Read cache.json to get historical data
4. Create output directory NewsReport/
5. Check if partial report exists for today (append mode)Wave 1 (Parallel):
- Worker A: Tier1 Batch A (HN, HuggingFace Papers)
- Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)
Wait for Results → Evaluate Quantity
If <15 high-quality items:
Wave 2 (Parallel):
- Worker C: Tier2 Batch A (James Clear, FS Blog)
- Worker D: Tier2 Batch B (HackerNoon, Scott Young)
If still <20 items:
Wave 3 (Browser):
- Browser Worker: ProductHunt, Latent Space (require JS rendering)task: fetch_and_extract
sources:
- id: hn
url: https://news.ycombinator.com
extract: top_10
- id: hf_papers
url: https://huggingface.co/papers
extract: top_voted
output_schema:
items:
- source_id: string # Source identifier
title: string # Title
summary: string # 2-4 sentence summary
key_points: string[] # Up to 3 key points
url: string # Original link
keywords: string[] # Keywords
quality_score: 1-5 # Quality score
constraints:
filter: "Cutting-edge tech/advanced tech/efficiency tech/practical information"
exclude: "Popular science/marketing articles/overly academic/recruitment posts"
max_items_per_source: 10
skip_on_error: true
return_format: JSONMonitoring:
- Check SubAgent return status (success/partial/failed)
- Count collected items
- Record success rate of each source
Feedback Loop:
- If a SubAgent fails, decide whether to retry or skip
- If a source fails continuously, mark as disabled
- Dynamically adjust source selection for subsequent batches
Decision:
- Item count >=25 and high-quality >=20 → Stop fetching
- Item count <15 → Proceed to next batch
- All batches completed but <20 → Generate report with existing content (prioritize quality over quantity)Deduplication:
- Exact URL match
- Title similarity (>80% considered duplicate)
- Check cache.json to avoid duplicates with history
Score Calibration:
- Unify scoring standards across SubAgents
- Adjust weights based on source credibility
- Add points to manually marked high-quality sources
Sorting:
- Descending order by quality_score
- Same score sorted by source priority
- Take Top 20Flow:
1. Call mcp__chrome-devtools__new_page to open page
2. Call mcp__chrome-devtools__wait_for to wait for content loading
3. Call mcp__chrome-devtools__take_snapshot to get page structure
4. Parse snapshot to extract required content
5. Call mcp__chrome-devtools__close_page to close page
Applicable Scenarios:
- ProductHunt (403 on WebFetch)
- Latent Space (Substack JS rendering)
- Other SPA applicationsOutput:
- Directory: NewsReport/
- Filename: YYYY-MM-DD-news-report.md
- Format: Standard Markdown
Content Structure:
- Title + Date
- Statistical summary (number of sources, number of included items)
- 20 high-quality items (per template)
- Generation info (version, timestamp)Update cache.json:
- last_run: Record current run information
- source_stats: Update statistics of each source
- url_cache: Add processed URLs
- content_hashes: Add content fingerprints
- article_history: Record included articlesTask Call:
subagent_type: general-purpose
model: haiku
prompt: |
You are a stateless execution unit. Only perform assigned tasks and return structured JSON.
Task: Fetch the following URLs and extract content
URLs:
- https://news.ycombinator.com (extract Top 10)
- https://huggingface.co/papers (extract top-voted papers)
Output Format:
{
"status": "success" | "partial" | "failed",
"data": [
{
"source_id": "hn",
"title": "...",
"summary": "...",
"key_points": ["...", "...", "..."],
"url": "...",
"keywords": ["...", "..."],
"quality_score": 4
}
],
"errors": [],
"metadata": { "processed": 2, "failed": 0 }
}
Filter Criteria:
- Keep: Cutting-edge tech/advanced tech/efficiency tech/practical information
- Exclude: Popular science/marketing articles/overly academic/recruitment posts
Return JSON directly without explanation.Task Call:
subagent_type: worker
prompt: |
task: fetch_and_extract
input:
urls:
- https://news.ycombinator.com
- https://huggingface.co/papers
output_schema:
- source_id: string
- title: string
- summary: string
- key_points: string[]
- url: string
- keywords: string[]
- quality_score: 1-5
constraints:
filter: Cutting-edge tech/advanced tech/efficiency tech/practical information
exclude: Popular science/marketing articles/overly academic# Daily News Report(YYYY-MM-DD)
> Today's content is selected from N information sources, with 20 high-quality items included
> Generation time: X minutes | Version: v3.0
>
> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.
> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.
---
## 1. Title
- **Summary**: 2-4 line overview
- **Key Points**:
1. Key point 1
2. Key point 2
3. Key point 3
- **Source**: [Link](URL)
- **Keywords**: `keyword1` `keyword2` `keyword3`
- **Score**: ⭐⭐⭐⭐⭐ (5/5)
---
## 2. Title
...
---
*Generated by Daily News Report v3.0*
*Sources: HN, HuggingFace, OneUsefulThing, ...*| Scenario | Expected Time | Description |
|---|---|---|
| Optimal Case | ~2 minutes | Tier1 sources are sufficient, no browser required |
| Normal Case | ~3-4 minutes | Tier2 sources needed for supplement |
| Browser Required | ~5-6 minutes | Includes JS rendered pages |
| Error Type | Handling Method |
|---|---|
| SubAgent Timeout | Record error, proceed to next one |
| Source 403/404 | Mark as disabled, update sources.json |
| Content Extraction Failure | Return raw content, Main Agent decides |
| Browser Crash | Skip the source, log the error |
worker