daily-news-report

Original🇨🇳 Chinese
Translated

Fetch content based on a preset URL list, filter high-quality technical information and generate daily Markdown reports.

6installs
Added on

NPX Install

npx skill4agent add rookie-ricardo/erduo-skills daily-news-report

SKILL.md Content (Chinese)

View Translation Comparison →

Daily News Report v3.0

Architecture Upgrade: Main Agent (Orchestrator) Scheduling + SubAgent Execution + Browser Fetching + Intelligent Caching

Core Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        Main Agent (Orchestrator)                       │
│  Responsibilities: Scheduling, Monitoring, Evaluation, Decision-Making, Summarization │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
│   │ 1. Initialize │ → │ 2. Schedule   │ → │ 3. Monitor   │ → │ 4. Evaluate   │     │
│   │ Read Config  │    │ Distribute Tasks  │    │ Collect Results  │    │ Filter & Sort  │     │
│   └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
│         │               │               │               │           │
│         ▼               ▼               ▼               ▼           │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
│   │ 5. Decide   │ ← │ ≥20 Items?  │    │ 6. Generate   │ → │ 7. Update   │     │
│   │ Continue/Stop │    │ Y/N      │    │ Daily Report File  │    │ Cache Statistics  │     │
│   └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘
         ↓ Schedule                              ↑ Return Results
┌─────────────────────────────────────────────────────────────────────┐
│                        SubAgent Execution Layer                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐              │
│   │ Worker A    │   │ Worker B    │   │ Browser     │              │
│   │ (WebFetch)  │   │ (WebFetch)  │   │ (Headless)  │              │
│   │ Tier1 Batch │   │ Tier2 Batch │   │ JS Rendered Pages   │              │
│   └─────────────┘   └─────────────┘   └─────────────┘              │
│         ↓                 ↓                 ↓                        │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                    Return Structured Results                             │   │
│   │  { status, data: [...], errors: [...], metadata: {...} }    │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Configuration Files

This Skill uses the following configuration files:
FilePurpose
sources.json
Information source configuration, priority, fetch method
cache.json
Cache data, historical statistics, deduplication fingerprints

Detailed Execution Flow

Phase 1: Initialization

yaml
Steps:
  1. Determine date (user parameter or current date)
  2. Read sources.json to get source configurations
  3. Read cache.json to get historical data
  4. Create output directory NewsReport/
  5. Check if partial report exists for today (append mode)

Phase 2: Schedule SubAgents

Strategy: Parallel scheduling, batch execution, early stopping mechanism
yaml
Wave 1 (Parallel):
  - Worker A: Tier1 Batch A (HN, HuggingFace Papers)
  - Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)

Wait for Results → Evaluate Quantity

If <15 high-quality items:
  Wave 2 (Parallel):
    - Worker C: Tier2 Batch A (James Clear, FS Blog)
    - Worker D: Tier2 Batch B (HackerNoon, Scott Young)

If still <20 items:
  Wave 3 (Browser):
    - Browser Worker: ProductHunt, Latent Space (require JS rendering)

Phase 3: SubAgent Task Format

Task format received by each SubAgent:
yaml
task: fetch_and_extract
sources:
  - id: hn
    url: https://news.ycombinator.com
    extract: top_10
  - id: hf_papers
    url: https://huggingface.co/papers
    extract: top_voted

output_schema:
  items:
    - source_id: string      # Source identifier
      title: string          # Title
      summary: string        # 2-4 sentence summary
      key_points: string[]   # Up to 3 key points
      url: string            # Original link
      keywords: string[]     # Keywords
      quality_score: 1-5     # Quality score

constraints:
  filter: "Cutting-edge tech/advanced tech/efficiency tech/practical information"
  exclude: "Popular science/marketing articles/overly academic/recruitment posts"
  max_items_per_source: 10
  skip_on_error: true

return_format: JSON

Phase 4: Main Agent Monitoring & Feedback

Main Agent Responsibilities:
yaml
Monitoring:
  - Check SubAgent return status (success/partial/failed)
  - Count collected items
  - Record success rate of each source

Feedback Loop:
  - If a SubAgent fails, decide whether to retry or skip
  - If a source fails continuously, mark as disabled
  - Dynamically adjust source selection for subsequent batches

Decision:
  - Item count >=25 and high-quality >=20 → Stop fetching
  - Item count <15 → Proceed to next batch
  - All batches completed but <20 → Generate report with existing content (prioritize quality over quantity)

Phase 5: Evaluation & Filtering

yaml
Deduplication:
  - Exact URL match
  - Title similarity (>80% considered duplicate)
  - Check cache.json to avoid duplicates with history

Score Calibration:
  - Unify scoring standards across SubAgents
  - Adjust weights based on source credibility
  - Add points to manually marked high-quality sources

Sorting:
  - Descending order by quality_score
  - Same score sorted by source priority
  - Take Top 20

Phase 6: Browser Fetching (MCP Chrome DevTools)

For pages requiring JS rendering, use headless browser:
yaml
Flow:
  1. Call mcp__chrome-devtools__new_page to open page
  2. Call mcp__chrome-devtools__wait_for to wait for content loading
  3. Call mcp__chrome-devtools__take_snapshot to get page structure
  4. Parse snapshot to extract required content
  5. Call mcp__chrome-devtools__close_page to close page

Applicable Scenarios:
  - ProductHunt (403 on WebFetch)
  - Latent Space (Substack JS rendering)
  - Other SPA applications

Phase 7: Generate Daily Report

yaml
Output:
  - Directory: NewsReport/
  - Filename: YYYY-MM-DD-news-report.md
  - Format: Standard Markdown

Content Structure:
  - Title + Date
  - Statistical summary (number of sources, number of included items)
  - 20 high-quality items (per template)
  - Generation info (version, timestamp)

Phase 8: Update Cache

yaml
Update cache.json:
  - last_run: Record current run information
  - source_stats: Update statistics of each source
  - url_cache: Add processed URLs
  - content_hashes: Add content fingerprints
  - article_history: Record included articles

SubAgent Call Examples

Use general-purpose Agent

Since custom agents require session restart to be detected, you can use general-purpose and inject worker prompt:
Task Call:
  subagent_type: general-purpose
  model: haiku
  prompt: |
    You are a stateless execution unit. Only perform assigned tasks and return structured JSON.

    Task: Fetch the following URLs and extract content

    URLs:
    - https://news.ycombinator.com (extract Top 10)
    - https://huggingface.co/papers (extract top-voted papers)

    Output Format:
    {
      "status": "success" | "partial" | "failed",
      "data": [
        {
          "source_id": "hn",
          "title": "...",
          "summary": "...",
          "key_points": ["...", "...", "..."],
          "url": "...",
          "keywords": ["...", "..."],
          "quality_score": 4
        }
      ],
      "errors": [],
      "metadata": { "processed": 2, "failed": 0 }
    }

    Filter Criteria:
    - Keep: Cutting-edge tech/advanced tech/efficiency tech/practical information
    - Exclude: Popular science/marketing articles/overly academic/recruitment posts

    Return JSON directly without explanation.

Use worker Agent (requires session restart)

Task Call:
  subagent_type: worker
  prompt: |
    task: fetch_and_extract
    input:
      urls:
        - https://news.ycombinator.com
        - https://huggingface.co/papers
    output_schema:
      - source_id: string
      - title: string
      - summary: string
      - key_points: string[]
      - url: string
      - keywords: string[]
      - quality_score: 1-5
    constraints:
      filter: Cutting-edge tech/advanced tech/efficiency tech/practical information
      exclude: Popular science/marketing articles/overly academic

Output Template

markdown
# Daily News Report(YYYY-MM-DD)

> Today's content is selected from N information sources, with 20 high-quality items included
> Generation time: X minutes | Version: v3.0
>
> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.
> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.

---

## 1. Title

- **Summary**: 2-4 line overview
- **Key Points**:
  1. Key point 1
  2. Key point 2
  3. Key point 3
- **Source**: [Link](URL)
- **Keywords**: `keyword1` `keyword2` `keyword3`
- **Score**: ⭐⭐⭐⭐⭐ (5/5)

---

## 2. Title
...

---

*Generated by Daily News Report v3.0*
*Sources: HN, HuggingFace, OneUsefulThing, ...*

Constraints & Principles

  1. Prioritize Quality Over Quantity: Low-quality content will not be included in the report
  2. Early Stopping: Stop fetching once 20 high-quality items are collected
  3. Parallel First: SubAgents in the same batch execute in parallel
  4. Failure Tolerance: Single source failure does not affect the overall process
  5. Cache Reuse: Avoid re-fetching the same content
  6. Main Agent Control: All decisions are made by the Main Agent
  7. Fallback Awareness: Detect sub-agent availability and gracefully degrade when unavailable

Expected Performance

ScenarioExpected TimeDescription
Optimal Case~2 minutesTier1 sources are sufficient, no browser required
Normal Case~3-4 minutesTier2 sources needed for supplement
Browser Required~5-6 minutesIncludes JS rendered pages

Error Handling

Error TypeHandling Method
SubAgent TimeoutRecord error, proceed to next one
Source 403/404Mark as disabled, update sources.json
Content Extraction FailureReturn raw content, Main Agent decides
Browser CrashSkip the source, log the error

Compatibility & Fallback

To ensure availability in different Agent environments, the following checks must be performed:
  1. Environment Check:
    • In Phase 1 initialization, attempt to detect if
      worker
      sub-agent exists.
    • If not found (or related plugin not installed), automatically switch to Serial Execution Mode.
  2. Serial Execution Mode:
    • Do not use parallel block.
    • Main Agent executes fetch tasks for each source sequentially.
    • Although slower, basic functionality is guaranteed.
  3. User Prompt:
    • Must include a clear warning message at the beginning of the generated report (in the blockquote section) to inform users that it is running in degraded mode.