wechat-article-aggregator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese微信公众号文章聚合器
WeChat Official Account Article Aggregator
通过 mptext.top API 批量获取指定公众号博主的最新文章,下载并解析为 Markdown/HTML/纯文本格式输出。
Batch fetch the latest articles of specified WeChat Official Account authors via the mptext.top API, download and parse them into Markdown/HTML/plain text formats for output.
快速开始
Quick Start
获取单个公众号最新 2 篇文章:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ=="获取多个公众号文章(逗号分隔 fakeid):
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==,MjM5NDI4MTY3NA==" --limit 3按公众号名称获取:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI"获取所有预置公众号的文章:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all --limit 2Fetch the latest 2 articles from a single official account:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ=="Fetch articles from multiple official accounts (comma-separated fakeids):
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==,MjM5NDI4MTY3NA==" --limit 3Fetch by official account name:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI"Fetch articles from all pre-configured official accounts:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all --limit 2依赖安装
Dependency Installation
bash
pip install requests beautifulsoup4 html2text最低依赖: 仅为必须依赖。requests和beautifulsoup4用于增强 Markdown 转换效果,未安装时会使用内置 HTML 解析器。html2text
bash
pip install requests beautifulsoup4 html2textMinimum Dependencies: Onlyis a required dependency.requestsandbeautifulsoup4are used to enhance Markdown conversion effects. If not installed, the built-in HTML parser will be used.html2text
用户参数说明
User Parameter Description
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|---|---|---|---|---|
| string | 是 | - | mptext.top 的 API Key |
| string | 是 | - | 公众号 fakeid 列表(逗号分隔)、公众号名称、或 |
| int | 否 | 2 | 每个公众号获取的文章数量 |
| string | 否 | | 输出目录 |
| string | 否 | | 输出格式: |
| string | 否 | 自动查找 | 自定义公众号账号列表 JSON 文件路径 |
| float | 否 | 1.0 | 请求间隔秒数 |
| flag | 否 | - | 列出所有预置公众号信息 |
| Parameter | Type | Required | Default Value | Description |
|---|---|---|---|---|
| string | Yes | - | mptext.top API Key |
| string | Yes | - | Official account fakeid list (comma-separated), official account name, or |
| int | No | 2 | Number of articles fetched per official account |
| string | No | | Output directory |
| string | No | | Output format: |
| string | No | Auto find | Path of custom official account list JSON file |
| float | No | 1.0 | Request interval in seconds |
| flag | No | - | List all pre-configured official account information |
API Key 获取方式
How to Get API Key
API Key 来源于 mptext.top 平台,用于认证文章获取请求。在请求头中以 传递。
X-Auth-KeyThe API Key comes from the mptext.top platform and is used to authenticate article fetch requests. It is passed in the request header as .
X-Auth-Keyfakeid 说明
fakeid Explanation
fakeid- 在微信公众号平台后台查看
- 从公众号文章 URL 中的 参数提取
__biz - 使用本 skill 预置的账号列表
fakeid- View in the WeChat Official Account Platform backend
- Extract from the parameter in the official account article URL
__biz - Use the pre-configured account list of this skill
功能说明
Feature Description
1. 获取文章列表
1. Fetch Article List
调用 mptext.top API 获取指定公众号的最新文章列表:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==" --limit 5Call the mptext.top API to fetch the latest article list of the specified official account:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==" --limit 52. 下载并解析文章内容
2. Download and Parse Article Content
获取文章 HTML 后自动提取 正文区域,转换为 Markdown 格式:
#js_contentbash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心" --format markdownAfter obtaining the article HTML, automatically extract the body area and convert it to Markdown format:
#js_contentbash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心" --format markdown3. 批量获取多个公众号
3. Batch Fetch Multiple Official Accounts
同时获取多个公众号的最新文章:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI,老金开源" --limit 3Fetch the latest articles from multiple official accounts at the same time:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI,老金开源" --limit 34. 获取所有预置公众号
4. Fetch All Pre-configured Official Accounts
使用 关键字获取所有预置公众号的文章:
allbash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids allUse the keyword to fetch articles from all pre-configured official accounts:
allbash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all5. 查看预置公众号列表
5. View Pre-configured Official Account List
bash
python scripts/fetch_articles.py --api-key dummy --fakeids dummy --list-accountsbash
python scripts/fetch_articles.py --api-key dummy --fakeids dummy --list-accounts6. 作为 Python 库调用
6. Call as a Python Library
python
import sys
sys.path.insert(0, 'scripts')
from fetch_articles import get_article_list, download_article_html, extract_markdown_from_html, load_accounts, resolve_fakeids, fetch_all
api_key = "YOUR_API_KEY"python
import sys
sys.path.insert(0, 'scripts')
from fetch_articles import get_article_list, download_article_html, extract_markdown_from_html, load_accounts, resolve_fakeids, fetch_all
api_key = "YOUR_API_KEY"获取单个公众号文章列表
Get article list of a single official account
articles = get_article_list(api_key, "MzkzNDQxOTU2MQ==", limit=2)
for art in articles:
print(art['title'], art['url'])
articles = get_article_list(api_key, "MzkzNDQxOTU2MQ==", limit=2)
for art in articles:
print(art['title'], art['url'])
下载并解析文章内容
Download and parse article content
html = download_article_html(api_key, articles[0]['url'])
markdown = extract_markdown_from_html(html, title=articles[0]['title'])
print(markdown[:500])
html = download_article_html(api_key, articles[0]['url'])
markdown = extract_markdown_from_html(html, title=articles[0]['title'])
print(markdown[:500])
批量获取多个公众号
Batch fetch multiple official accounts
accounts = load_accounts()
fakeids = resolve_fakeids("赛博禅心,饼干哥哥AGI", accounts)
summary = fetch_all(api_key, fakeids, limit=2, output_dir="./output")
print(f"成功: {summary['success']}, 失败: {summary['fail']}")
undefinedaccounts = load_accounts()
fakeids = resolve_fakeids("赛博禅心,饼干哥哥AGI", accounts)
summary = fetch_all(api_key, fakeids, limit=2, output_dir="./output")
print(f"Success: {summary['success']}, Fail: {summary['fail']}")
undefined输出结构
Output Structure
output/
├── 赛博禅心/
│ ├── 文章标题1.md
│ └── 文章标题2.md
├── 饼干哥哥AGI/
│ ├── 文章标题1.md
│ └── 文章标题2.md
├── 老金开源/
│ └── ...
└── summary.json # 所有文章的元数据汇总output/
├── 赛博禅心/
│ ├── 文章标题1.md
│ └── 文章标题2.md
├── 饼干哥哥AGI/
│ ├── 文章标题1.md
│ └── 文章标题2.md
├── 老金开源/
│ └── ...
└── summary.json # Metadata summary of all articlessummary.json 格式
summary.json Format
json
{
"fetch_time": "2026-02-23T17:30:00",
"total_accounts": 3,
"total_articles": 6,
"success": 5,
"fail": 1,
"accounts": [
{
"fakeid": "MzkzNDQxOTU2MQ==",
"name": "赛博禅心",
"articles": [
{
"title": "文章标题",
"url": "https://mp.weixin.qq.com/s/...",
"create_time": "1708689600",
"saved_path": "output/赛博禅心/文章标题.md",
"status": "success"
}
]
}
]
}json
{
"fetch_time": "2026-02-23T17:30:00",
"total_accounts": 3,
"total_articles": 6,
"success": 5,
"fail": 1,
"accounts": [
{
"fakeid": "MzkzNDQxOTU2MQ==",
"name": "赛博禅心",
"articles": [
{
"title": "文章标题",
"url": "https://mp.weixin.qq.com/s/...",
"create_time": "1708689600",
"saved_path": "output/赛博禅心/文章标题.md",
"status": "success"
}
]
}
]
}API 接口说明
API Interface Description
文章列表接口
Article List Interface
GET https://down.mptext.top/api/public/v1/article?fakeid={URL_ENCODED_FAKEID}&limit={N}| 参数 | 说明 |
|---|---|
| 公众号的 fakeid,需 URL 编码( |
| 返回文章数量 |
请求头:
X-Auth-Key: {YOUR_API_KEY}响应示例:
json
[
{
"title": "文章标题",
"url": "https://mp.weixin.qq.com/s/xxxxx",
"create_time": 1708689600
}
]GET https://down.mptext.top/api/public/v1/article?fakeid={URL_ENCODED_FAKEID}&limit={N}| Parameter | Description |
|---|---|
| fakeid of the official account, requires URL encoding ( |
| Number of articles returned |
Request Header:
X-Auth-Key: {YOUR_API_KEY}Response Example:
json
[
{
"title": "文章标题",
"url": "https://mp.weixin.qq.com/s/xxxxx",
"create_time": 1708689600
}
]文章下载接口
Article Download Interface
GET https://down.mptext.top/api/public/v1/download?url={URL_ENCODED_ARTICLE_URL}&type=html| 参数 | 说明 |
|---|---|
| 微信文章 URL,需 URL 编码 |
| 固定为 |
请求头:
X-Auth-Key: {YOUR_API_KEY}响应: 完整的微信文章 HTML 页面。
GET https://down.mptext.top/api/public/v1/download?url={URL_ENCODED_ARTICLE_URL}&type=html| Parameter | Description |
|---|---|
| WeChat article URL, requires URL encoding |
| Fixed as |
Request Header:
X-Auth-Key: {YOUR_API_KEY}Response: Complete WeChat article HTML page.
HTML 正文解析规则
HTML Body Parsing Rules
从下载的 HTML 中提取正文内容:
- 定位 的 div 元素
id="js_content" - 移除 、
script、style标签内容noscript - 提取文本内容,保留段落换行
- 使用 转换为 Markdown(如已安装)
html2text
Extract body content from the downloaded HTML:
- Locate the div element with
id="js_content" - Remove content of ,
script,styletagsnoscript - Extract text content, retain paragraph line breaks
- Use to convert to Markdown (if installed)
html2text
预置公众号列表
Pre-configured Official Account List
| 序号 | 公众号名称 | 分类 | FakeID |
|---|---|---|---|
| 1 | 饼干哥哥AGI | AI编程 | |
| 2 | 赛博禅心 | AI前沿 | |
| 3 | 可怜的小互 | AI技术 | |
| 4 | 宝玉的工程技术分享 | 技术翻译 | |
| 5 | 苍何 | AI实战 | |
| 6 | 老金开源 | Claude Code | |
| 7 | 玩转AI工具 | AI工具 | |
| 8 | 袋鼠帝AI客栈 | AI实战 | |
可通过 参数指定自定义的公众号列表 JSON 文件来扩展。
--accounts-file| No. | Official Account Name | Category | FakeID |
|---|---|---|---|
| 1 | 饼干哥哥AGI | AI Programming | |
| 2 | 赛博禅心 | AI Frontier | |
| 3 | 可怜的小互 | AI Technology | |
| 4 | 宝玉的工程技术分享 | Technical Translation | |
| 5 | 苍何 | AI Practice | |
| 6 | 老金开源 | Claude Code | |
| 7 | 玩转AI工具 | AI Tools | |
| 8 | 袋鼠帝AI客栈 | AI Practice | |
You can specify a custom official account list JSON file via the parameter for expansion.
--accounts-file注意事项
Notes
- API Key 安全: 请勿将 API Key 硬编码到代码中,建议通过环境变量或命令行参数传入。
- 请求频率: 默认间隔 1 秒,如遇 429 错误请增大 值。
--interval - HTML 解析: 下载接口返回完整 HTML 页面,脚本自动从 区域提取正文。
#js_content - 依赖降级: 未安装 和
beautifulsoup4时,使用内置 HTMLParser 提取纯文本。html2text - 文件命名: 输出文件以文章标题命名,自动去除特殊字符,长度截断为 80 字符。
- API Key Security: Do not hardcode the API Key into the code, it is recommended to pass it through environment variables or command line parameters.
- Request Frequency: The default interval is 1 second. If you encounter a 429 error, please increase the value.
--interval - HTML Parsing: The download interface returns a complete HTML page, and the script automatically extracts the body from the area.
#js_content - Dependency Degradation: When and
beautifulsoup4are not installed, use the built-in HTMLParser to extract plain text.html2text - File Naming: Output files are named after the article title, special characters are automatically removed, and the length is truncated to 80 characters.
触发关键词
Trigger Keywords
- "获取公众号文章"
- "抓取微信文章"
- "公众号文章聚合"
- "批量获取公众号"
- "下载公众号文章"
- "微信文章采集"
- "获取最新公众号文章"
- "Fetch official account articles"
- "Crawl WeChat articles"
- "Official account article aggregation"
- "Batch fetch official accounts"
- "Download official account articles"
- "WeChat article collection"
- "Fetch latest official account articles"
更新日志
Changelog
v1.0.0 (2026-02-23)
v1.0.0 (2026-02-23)
- 初始版本
- 支持通过 mptext.top API 获取公众号文章列表
- 支持下载文章 HTML 并解析为 Markdown/HTML/Text/JSON
- 内置 HTMLParser 提取 正文(零依赖降级方案)
#js_content - 预置 8 个热门 AI 技术公众号 fakeid
- 支持按公众号名称或 fakeid 获取
- 支持 关键字获取所有预置公众号
all - 自动生成 summary.json 汇总元数据
- Initial version
- Support fetching official account article list via mptext.top API
- Support downloading article HTML and parsing into Markdown/HTML/Text/JSON
- Built-in HTMLParser to extract body (zero-dependency degradation solution)
#js_content - Pre-configured fakeids of 8 popular AI tech official accounts
- Support fetching by official account name or fakeid
- Support keyword to fetch all pre-configured official accounts
all - Automatically generate summary.json to aggregate metadata