Loading...
Loading...
Compare original and translation side by side
| 平台 | ID | URL 示例 |
|---|---|---|
| 微信公众号 | | |
| 今日头条 | toutiao | |
| 网易新闻 | netease | |
| 搜狐新闻 | sohu | |
| 腾讯新闻 | tencent | |
| Platform | ID | URL Example |
|---|---|---|
| WeChat Official Accounts | | |
| Toutiao | toutiao | |
| NetEase News | netease | |
| Sohu News | sohu | |
| Tencent News | tencent | |
cd ~/.claude/skills/news-extractor
uv syncuv runpythonuv runcd ~/.claude/skills/news-extractor
uv syncuv runpythonuv run| 包名 | 用途 |
|---|---|
| pydantic | 数据模型验证 |
| requests | HTTP 请求 |
| curl_cffi | 浏览器模拟抓取 |
| tenacity | 重试机制 |
| parsel | HTML/XPath 解析 |
| demjson3 | 非标准 JSON 解析 |
| Package | Purpose |
|---|---|
| pydantic | Data model validation |
| requests | HTTP requests |
| curl_cffi | Browser simulation crawling |
| tenacity | Retry mechanism |
| parsel | HTML/XPath parsing |
| demjson3 | Non-standard JSON parsing |
undefinedundefinedundefinedundefined./output{news_id}.json{news_id}.md./output{news_id}.json{news_id}.md{
"title": "文章标题",
"news_url": "原始链接",
"news_id": "文章ID",
"meta_info": {
"author_name": "作者/来源",
"author_url": "",
"publish_time": "2024-01-01 12:00"
},
"contents": [
{"type": "text", "content": "段落文本", "desc": ""},
{"type": "image", "content": "https://...", "desc": ""},
{"type": "video", "content": "https://...", "desc": ""}
],
"texts": ["段落1", "段落2"],
"images": ["图片URL1", "图片URL2"],
"videos": []
}{
"title": "Article Title",
"news_url": "Original Link",
"news_id": "Article ID",
"meta_info": {
"author_name": "Author/Source",
"author_url": "",
"publish_time": "2024-01-01 12:00"
},
"contents": [
{"type": "text", "content": "Paragraph text", "desc": ""},
{"type": "image", "content": "https://...", "desc": ""},
{"type": "video", "content": "https://...", "desc": ""}
],
"texts": ["Paragraph 1", "Paragraph 2"],
"images": ["Image URL1", "Image URL2"],
"videos": []
}undefinedundefinedundefinedundefineduv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://mp.weixin.qq.com/s/ebMzDPu2zMT_mRgYgtL6eQ"[INFO] Platform detected: wechat (微信公众号)
[INFO] Extracting content...
[INFO] Title: 文章标题
[INFO] Author: 公众号名称
[INFO] Text paragraphs: 15
[INFO] Images: 3
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.json
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.mduv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://mp.weixin.qq.com/s/ebMzDPu2zMT_mRgYgtL6eQ"[INFO] Platform detected: wechat (WeChat Official Accounts)
[INFO] Extracting content...
[INFO] Title: Article Title
[INFO] Author: Official Account Name
[INFO] Text paragraphs: 15
[INFO] Images: 3
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.json
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.mduv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://www.toutiao.com/article/7434425099895210546/"uv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://www.toutiao.com/article/7434425099895210546/"| 错误类型 | 说明 | 解决方案 |
|---|---|---|
| URL 不匹配任何支持的平台 | 检查 URL 是否正确 |
| 非支持的站点 | 本 Skill 仅支持列出的新闻站点 |
| 网络错误或页面结构变化 | 重试或检查 URL 有效性 |
| Error Type | Description | Solution |
|---|---|---|
| URL does not match any supported platform | Check if the URL is correct |
| Unsupported site | This Skill only supports the listed news sites |
| Network error or page structure change | Retry or check URL validity |