Loading...
Loading...
News site content extraction. Supports WeChat Official Accounts, Toutiao, NetEase News, Sohu News, and Tencent News. Activated when users need to extract news content, crawl official account articles, scrape news, or obtain news in JSON/Markdown format.
npx skill4agent add nanmicoder/claude-code-skills news-extractor| Platform | ID | URL Example |
|---|---|---|
| WeChat Official Accounts | | |
| Toutiao | toutiao | |
| NetEase News | netease | |
| Sohu News | sohu | |
| Tencent News | tencent | |
cd ~/.claude/skills/news-extractor
uv syncuv runpythonuv run| Package | Purpose |
|---|---|
| pydantic | Data model validation |
| requests | HTTP requests |
| curl_cffi | Browser simulation crawling |
| tenacity | Retry mechanism |
| parsel | HTML/XPath parsing |
| demjson3 | Non-standard JSON parsing |
# Extract news, auto-detect platform, output JSON + Markdown
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL"
# Specify output directory
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL" --output ./output
# Output only JSON
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL" --format json
# Output only Markdown
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL" --format markdown
# List supported platforms
uv run .claude/skills/news-extractor/scripts/extract_news.py --list-platforms./output{news_id}.json{news_id}.md{
"title": "Article Title",
"news_url": "Original Link",
"news_id": "Article ID",
"meta_info": {
"author_name": "Author/Source",
"author_url": "",
"publish_time": "2024-01-01 12:00"
},
"contents": [
{"type": "text", "content": "Paragraph text", "desc": ""},
{"type": "image", "content": "https://...", "desc": ""},
{"type": "video", "content": "https://...", "desc": ""}
],
"texts": ["Paragraph 1", "Paragraph 2"],
"images": ["Image URL1", "Image URL2"],
"videos": []
}# Article Title
## Article Information
**Author**: xxx
**Publish Time**: 2024-01-01 12:00
**Original Link**: [Link](URL)
---
## Article Content
Paragraph content...

---
## Media Resources
### Images (N)
1. URL1
2. URL2uv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://mp.weixin.qq.com/s/ebMzDPu2zMT_mRgYgtL6eQ"[INFO] Platform detected: wechat (WeChat Official Accounts)
[INFO] Extracting content...
[INFO] Title: Article Title
[INFO] Author: Official Account Name
[INFO] Text paragraphs: 15
[INFO] Images: 3
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.json
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.mduv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://www.toutiao.com/article/7434425099895210546/"| Error Type | Description | Solution |
|---|---|---|
| URL does not match any supported platform | Check if the URL is correct |
| Unsupported site | This Skill only supports the listed news sites |
| Network error or page structure change | Retry or check URL validity |