wechat-article-aggregator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

微信公众号文章聚合器

WeChat Official Account Article Aggregator

通过 mptext.top API 批量获取指定公众号博主的最新文章，下载并解析为 Markdown/HTML/纯文本格式输出。

Batch fetch the latest articles of specified WeChat Official Account authors via the mptext.top API, download and parse them into Markdown/HTML/plain text formats for output.

快速开始

Quick Start

获取单个公众号最新 2 篇文章：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ=="

获取多个公众号文章（逗号分隔 fakeid）：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==,MjM5NDI4MTY3NA==" --limit 3

按公众号名称获取：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI"

获取所有预置公众号的文章：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all --limit 2

Fetch the latest 2 articles from a single official account:

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ=="

Fetch articles from multiple official accounts (comma-separated fakeids):

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==,MjM5NDI4MTY3NA==" --limit 3

Fetch by official account name:

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI"

Fetch articles from all pre-configured official accounts:

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all --limit 2

依赖安装

Dependency Installation

bash

pip install requests beautifulsoup4 html2text

最低依赖: 仅
requests
为必须依赖。
beautifulsoup4
和
html2text
用于增强 Markdown 转换效果，未安装时会使用内置 HTML 解析器。

bash

pip install requests beautifulsoup4 html2text

Minimum Dependencies: Only
requests
is a required dependency.
beautifulsoup4
and
html2text
are used to enhance Markdown conversion effects. If not installed, the built-in HTML parser will be used.

用户参数说明

User Parameter Description

参数	类型	必填	默认值	说明
`--api-key` / `-k`	string	是	-	mptext.top 的 API Key
`--fakeids` / `-f`	string	是	-	公众号 fakeid 列表（逗号分隔）、公众号名称、或 `all`
`--limit` / `-l`	int	否	2	每个公众号获取的文章数量
`--output-dir` / `-o`	string	否	`./output`	输出目录
`--format` / `-F`	string	否	`markdown`	输出格式： `markdown` / `html` / `text` / `json`
`--accounts-file` / `-a`	string	否	自动查找	自定义公众号账号列表 JSON 文件路径
`--interval` / `-i`	float	否	1.0	请求间隔秒数
`--list-accounts`	flag	否	-	列出所有预置公众号信息

Parameter	Type	Required	Default Value	Description
`--api-key` / `-k`	string	Yes	-	mptext.top API Key
`--fakeids` / `-f`	string	Yes	-	Official account fakeid list (comma-separated), official account name, or `all`
`--limit` / `-l`	int	No	2	Number of articles fetched per official account
`--output-dir` / `-o`	string	No	`./output`	Output directory
`--format` / `-F`	string	No	`markdown`	Output format: `markdown` / `html` / `text` / `json`
`--accounts-file` / `-a`	string	No	Auto find	Path of custom official account list JSON file
`--interval` / `-i`	float	No	1.0	Request interval in seconds
`--list-accounts`	flag	No	-	List all pre-configured official account information

API Key 获取方式

How to Get API Key

API Key 来源于 mptext.top 平台，用于认证文章获取请求。在请求头中以

X-Auth-Key

传递。

The API Key comes from the mptext.top platform and is used to authenticate article fetch requests. It is passed in the request header as

X-Auth-Key

fakeid 说明

fakeid Explanation

fakeid

是微信公众号的唯一标识（Base64 编码的 biz 参数），可通过以下方式获取：

在微信公众号平台后台查看
从公众号文章 URL 中的
```
__biz
```
参数提取
使用本 skill 预置的账号列表

fakeid

is the unique identifier of WeChat Official Accounts (Base64 encoded biz parameter), which can be obtained in the following ways:

View in the WeChat Official Account Platform backend
Extract from the
```
__biz
```
parameter in the official account article URL
Use the pre-configured account list of this skill

功能说明

Feature Description

1. 获取文章列表

1. Fetch Article List

调用 mptext.top API 获取指定公众号的最新文章列表：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==" --limit 5

Call the mptext.top API to fetch the latest article list of the specified official account:

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==" --limit 5

2. 下载并解析文章内容

2. Download and Parse Article Content

获取文章 HTML 后自动提取

#js_content

正文区域，转换为 Markdown 格式：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心" --format markdown

After obtaining the article HTML, automatically extract the

#js_content

body area and convert it to Markdown format:

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心" --format markdown

3. 批量获取多个公众号

3. Batch Fetch Multiple Official Accounts

同时获取多个公众号的最新文章：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI,老金开源" --limit 3

Fetch the latest articles from multiple official accounts at the same time:

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI,老金开源" --limit 3

4. 获取所有预置公众号

4. Fetch All Pre-configured Official Accounts

使用

all

关键字获取所有预置公众号的文章：

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all

Use the

all

keyword to fetch articles from all pre-configured official accounts:

bash

python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all

5. 查看预置公众号列表

5. View Pre-configured Official Account List

bash

python scripts/fetch_articles.py --api-key dummy --fakeids dummy --list-accounts

bash

python scripts/fetch_articles.py --api-key dummy --fakeids dummy --list-accounts

6. 作为 Python 库调用

6. Call as a Python Library

python

import sys
sys.path.insert(0, 'scripts')
from fetch_articles import get_article_list, download_article_html, extract_markdown_from_html, load_accounts, resolve_fakeids, fetch_all

api_key = "YOUR_API_KEY"

python

import sys
sys.path.insert(0, 'scripts')
from fetch_articles import get_article_list, download_article_html, extract_markdown_from_html, load_accounts, resolve_fakeids, fetch_all

api_key = "YOUR_API_KEY"

获取单个公众号文章列表

Get article list of a single official account

articles = get_article_list(api_key, "MzkzNDQxOTU2MQ==", limit=2) for art in articles: print(art['title'], art['url'])

下载并解析文章内容

Download and parse article content

html = download_article_html(api_key, articles[0]['url']) markdown = extract_markdown_from_html(html, title=articles[0]['title']) print(markdown[:500])

批量获取多个公众号

Batch fetch multiple official accounts

accounts = load_accounts() fakeids = resolve_fakeids("赛博禅心,饼干哥哥AGI", accounts) summary = fetch_all(api_key, fakeids, limit=2, output_dir="./output") print(f"成功: {summary['success']}, 失败: {summary['fail']}")

undefined

accounts = load_accounts() fakeids = resolve_fakeids("赛博禅心,饼干哥哥AGI", accounts) summary = fetch_all(api_key, fakeids, limit=2, output_dir="./output") print(f"Success: {summary['success']}, Fail: {summary['fail']}")

undefined

输出结构

Output Structure

output/
├── 赛博禅心/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 饼干哥哥AGI/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 老金开源/
│   └── ...
└── summary.json          # 所有文章的元数据汇总

output/
├── 赛博禅心/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 饼干哥哥AGI/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 老金开源/
│   └── ...
└── summary.json          # Metadata summary of all articles

summary.json 格式

summary.json Format

json

{
  "fetch_time": "2026-02-23T17:30:00",
  "total_accounts": 3,
  "total_articles": 6,
  "success": 5,
  "fail": 1,
  "accounts": [
    {
      "fakeid": "MzkzNDQxOTU2MQ==",
      "name": "赛博禅心",
      "articles": [
        {
          "title": "文章标题",
          "url": "https://mp.weixin.qq.com/s/...",
          "create_time": "1708689600",
          "saved_path": "output/赛博禅心/文章标题.md",
          "status": "success"
        }
      ]
    }
  ]
}

json

{
  "fetch_time": "2026-02-23T17:30:00",
  "total_accounts": 3,
  "total_articles": 6,
  "success": 5,
  "fail": 1,
  "accounts": [
    {
      "fakeid": "MzkzNDQxOTU2MQ==",
      "name": "赛博禅心",
      "articles": [
        {
          "title": "文章标题",
          "url": "https://mp.weixin.qq.com/s/...",
          "create_time": "1708689600",
          "saved_path": "output/赛博禅心/文章标题.md",
          "status": "success"
        }
      ]
    }
  ]
}

API 接口说明

API Interface Description

文章列表接口

Article List Interface

GET https://down.mptext.top/api/public/v1/article?fakeid={URL_ENCODED_FAKEID}&limit={N}

参数	说明
`fakeid`	公众号的 fakeid，需 URL 编码（ `==` → `%3D%3D` ）
`limit`	返回文章数量

请求头：

X-Auth-Key: {YOUR_API_KEY}

响应示例：

json

[
  {
    "title": "文章标题",
    "url": "https://mp.weixin.qq.com/s/xxxxx",
    "create_time": 1708689600
  }
]

GET https://down.mptext.top/api/public/v1/article?fakeid={URL_ENCODED_FAKEID}&limit={N}

Parameter	Description
`fakeid`	fakeid of the official account, requires URL encoding ( `==` → `%3D%3D` )
`limit`	Number of articles returned

Request Header:

X-Auth-Key: {YOUR_API_KEY}

Response Example:

json

[
  {
    "title": "文章标题",
    "url": "https://mp.weixin.qq.com/s/xxxxx",
    "create_time": 1708689600
  }
]

文章下载接口

Article Download Interface

GET https://down.mptext.top/api/public/v1/download?url={URL_ENCODED_ARTICLE_URL}&type=html

参数	说明
`url`	微信文章 URL，需 URL 编码
`type`	固定为 `html` （API 返回 HTML 格式）

请求头：

X-Auth-Key: {YOUR_API_KEY}

响应： 完整的微信文章 HTML 页面。

GET https://down.mptext.top/api/public/v1/download?url={URL_ENCODED_ARTICLE_URL}&type=html

Parameter	Description
`url`	WeChat article URL, requires URL encoding
`type`	Fixed as `html` (API returns HTML format)

Request Header:

X-Auth-Key: {YOUR_API_KEY}

Response: Complete WeChat article HTML page.

HTML 正文解析规则

HTML Body Parsing Rules

从下载的 HTML 中提取正文内容：

定位
```
id="js_content"
```
的 div 元素
移除
```
script
```
、
```
style
```
、
```
noscript
```
标签内容
提取文本内容，保留段落换行
使用
```
html2text
```
转换为 Markdown（如已安装）

Extract body content from the downloaded HTML:

Locate the div element with
```
id="js_content"
```
Remove content of
```
script
```
,
```
style
```
,
```
noscript
```
tags
Extract text content, retain paragraph line breaks
Use
```
html2text
```
to convert to Markdown (if installed)

预置公众号列表

Pre-configured Official Account List


MjM5NDI4MTY3NA==
MzkzNDQxOTU2MQ==
MzkzMTcyMTgxNg==
Mzk1NzgxMjQ0OA==
Mzg3MTk3NzYzNw==
MzI0NzU2MDgyNA==
MzU4NTE1Mjg4MA==
MzkwMzE4NjU5NA==

序号	公众号名称	分类	FakeID
1	饼干哥哥AGI	AI编程	`MjM5NDI4MTY3NA==`
2	赛博禅心	AI前沿	`MzkzNDQxOTU2MQ==`
3	可怜的小互	AI技术	`MzkzMTcyMTgxNg==`
4	宝玉的工程技术分享	技术翻译	`Mzk1NzgxMjQ0OA==`
5	苍何	AI实战	`Mzg3MTk3NzYzNw==`
6	老金开源	Claude Code	`MzI0NzU2MDgyNA==`
7	玩转AI工具	AI工具	`MzU4NTE1Mjg4MA==`
8	袋鼠帝AI客栈	AI实战	`MzkwMzE4NjU5NA==`

可通过

--accounts-file

参数指定自定义的公众号列表 JSON 文件来扩展。


MjM5NDI4MTY3NA==
MzkzNDQxOTU2MQ==
MzkzMTcyMTgxNg==
Mzk1NzgxMjQ0OA==
Mzg3MTk3NzYzNw==
MzI0NzU2MDgyNA==
MzU4NTE1Mjg4MA==
MzkwMzE4NjU5NA==

No.	Official Account Name	Category	FakeID
1	饼干哥哥AGI	AI Programming	`MjM5NDI4MTY3NA==`
2	赛博禅心	AI Frontier	`MzkzNDQxOTU2MQ==`
3	可怜的小互	AI Technology	`MzkzMTcyMTgxNg==`
4	宝玉的工程技术分享	Technical Translation	`Mzk1NzgxMjQ0OA==`
5	苍何	AI Practice	`Mzg3MTk3NzYzNw==`
6	老金开源	Claude Code	`MzI0NzU2MDgyNA==`
7	玩转AI工具	AI Tools	`MzU4NTE1Mjg4MA==`
8	袋鼠帝AI客栈	AI Practice	`MzkwMzE4NjU5NA==`

You can specify a custom official account list JSON file via the

--accounts-file

parameter for expansion.

注意事项

Notes

API Key 安全: 请勿将 API Key 硬编码到代码中，建议通过环境变量或命令行参数传入。
请求频率: 默认间隔 1 秒，如遇 429 错误请增大
```
--interval
```
值。
HTML 解析: 下载接口返回完整 HTML 页面，脚本自动从
```
#js_content
```
区域提取正文。
依赖降级: 未安装
```
beautifulsoup4
```
和
```
html2text
```
时，使用内置 HTMLParser 提取纯文本。
文件命名: 输出文件以文章标题命名，自动去除特殊字符，长度截断为 80 字符。

API Key Security: Do not hardcode the API Key into the code, it is recommended to pass it through environment variables or command line parameters.
Request Frequency: The default interval is 1 second. If you encounter a 429 error, please increase the
```
--interval
```
value.
HTML Parsing: The download interface returns a complete HTML page, and the script automatically extracts the body from the
```
#js_content
```
area.
Dependency Degradation: When
```
beautifulsoup4
```
and
```
html2text
```
are not installed, use the built-in HTMLParser to extract plain text.
File Naming: Output files are named after the article title, special characters are automatically removed, and the length is truncated to 80 characters.

触发关键词

Trigger Keywords

"获取公众号文章"
"抓取微信文章"
"公众号文章聚合"
"批量获取公众号"
"下载公众号文章"
"微信文章采集"
"获取最新公众号文章"

"Fetch official account articles"
"Crawl WeChat articles"
"Official account article aggregation"
"Batch fetch official accounts"
"Download official account articles"
"WeChat article collection"
"Fetch latest official account articles"

更新日志

Changelog

v1.0.0 (2026-02-23)

初始版本
支持通过 mptext.top API 获取公众号文章列表
支持下载文章 HTML 并解析为 Markdown/HTML/Text/JSON
内置 HTMLParser 提取
```
#js_content
```
正文（零依赖降级方案）
预置 8 个热门 AI 技术公众号 fakeid
支持按公众号名称或 fakeid 获取
支持
```
all
```
关键字获取所有预置公众号
自动生成 summary.json 汇总元数据

Initial version
Support fetching official account article list via mptext.top API
Support downloading article HTML and parsing into Markdown/HTML/Text/JSON
Built-in HTMLParser to extract
```
#js_content
```
body (zero-dependency degradation solution)
Pre-configured fakeids of 8 popular AI tech official accounts
Support fetching by official account name or fakeid
Support
```
all
```
keyword to fetch all pre-configured official accounts
Automatically generate summary.json to aggregate metadata