wechat-article-aggregator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

微信公众号文章聚合器

WeChat Official Account Article Aggregator

通过 mptext.top API 批量获取指定公众号博主的最新文章,下载并解析为 Markdown/HTML/纯文本格式输出。
Batch fetch the latest articles of specified WeChat Official Account authors via the mptext.top API, download and parse them into Markdown/HTML/plain text formats for output.

快速开始

Quick Start

获取单个公众号最新 2 篇文章:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ=="
获取多个公众号文章(逗号分隔 fakeid):
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==,MjM5NDI4MTY3NA==" --limit 3
按公众号名称获取:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI"
获取所有预置公众号的文章:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all --limit 2
Fetch the latest 2 articles from a single official account:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ=="
Fetch articles from multiple official accounts (comma-separated fakeids):
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==,MjM5NDI4MTY3NA==" --limit 3
Fetch by official account name:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI"
Fetch articles from all pre-configured official accounts:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all --limit 2

依赖安装

Dependency Installation

bash
pip install requests beautifulsoup4 html2text
最低依赖: 仅
requests
为必须依赖。
beautifulsoup4
html2text
用于增强 Markdown 转换效果,未安装时会使用内置 HTML 解析器。
bash
pip install requests beautifulsoup4 html2text
Minimum Dependencies: Only
requests
is a required dependency.
beautifulsoup4
and
html2text
are used to enhance Markdown conversion effects. If not installed, the built-in HTML parser will be used.

用户参数说明

User Parameter Description

参数类型必填默认值说明
--api-key
/
-k
string-mptext.top 的 API Key
--fakeids
/
-f
string-公众号 fakeid 列表(逗号分隔)、公众号名称、或
all
--limit
/
-l
int2每个公众号获取的文章数量
--output-dir
/
-o
string
./output
输出目录
--format
/
-F
string
markdown
输出格式:
markdown
/
html
/
text
/
json
--accounts-file
/
-a
string自动查找自定义公众号账号列表 JSON 文件路径
--interval
/
-i
float1.0请求间隔秒数
--list-accounts
flag-列出所有预置公众号信息
ParameterTypeRequiredDefault ValueDescription
--api-key
/
-k
stringYes-mptext.top API Key
--fakeids
/
-f
stringYes-Official account fakeid list (comma-separated), official account name, or
all
--limit
/
-l
intNo2Number of articles fetched per official account
--output-dir
/
-o
stringNo
./output
Output directory
--format
/
-F
stringNo
markdown
Output format:
markdown
/
html
/
text
/
json
--accounts-file
/
-a
stringNoAuto findPath of custom official account list JSON file
--interval
/
-i
floatNo1.0Request interval in seconds
--list-accounts
flagNo-List all pre-configured official account information

API Key 获取方式

How to Get API Key

API Key 来源于 mptext.top 平台,用于认证文章获取请求。在请求头中以
X-Auth-Key
传递。
The API Key comes from the mptext.top platform and is used to authenticate article fetch requests. It is passed in the request header as
X-Auth-Key
.

fakeid 说明

fakeid Explanation

fakeid
是微信公众号的唯一标识(Base64 编码的 biz 参数),可通过以下方式获取:
  1. 在微信公众号平台后台查看
  2. 从公众号文章 URL 中的
    __biz
    参数提取
  3. 使用本 skill 预置的账号列表
fakeid
is the unique identifier of WeChat Official Accounts (Base64 encoded biz parameter), which can be obtained in the following ways:
  1. View in the WeChat Official Account Platform backend
  2. Extract from the
    __biz
    parameter in the official account article URL
  3. Use the pre-configured account list of this skill

功能说明

Feature Description

1. 获取文章列表

1. Fetch Article List

调用 mptext.top API 获取指定公众号的最新文章列表:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==" --limit 5
Call the mptext.top API to fetch the latest article list of the specified official account:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "MzkzNDQxOTU2MQ==" --limit 5

2. 下载并解析文章内容

2. Download and Parse Article Content

获取文章 HTML 后自动提取
#js_content
正文区域,转换为 Markdown 格式:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心" --format markdown
After obtaining the article HTML, automatically extract the
#js_content
body area and convert it to Markdown format:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心" --format markdown

3. 批量获取多个公众号

3. Batch Fetch Multiple Official Accounts

同时获取多个公众号的最新文章:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI,老金开源" --limit 3
Fetch the latest articles from multiple official accounts at the same time:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids "赛博禅心,饼干哥哥AGI,老金开源" --limit 3

4. 获取所有预置公众号

4. Fetch All Pre-configured Official Accounts

使用
all
关键字获取所有预置公众号的文章:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all
Use the
all
keyword to fetch articles from all pre-configured official accounts:
bash
python scripts/fetch_articles.py --api-key YOUR_KEY --fakeids all

5. 查看预置公众号列表

5. View Pre-configured Official Account List

bash
python scripts/fetch_articles.py --api-key dummy --fakeids dummy --list-accounts
bash
python scripts/fetch_articles.py --api-key dummy --fakeids dummy --list-accounts

6. 作为 Python 库调用

6. Call as a Python Library

python
import sys
sys.path.insert(0, 'scripts')
from fetch_articles import get_article_list, download_article_html, extract_markdown_from_html, load_accounts, resolve_fakeids, fetch_all

api_key = "YOUR_API_KEY"
python
import sys
sys.path.insert(0, 'scripts')
from fetch_articles import get_article_list, download_article_html, extract_markdown_from_html, load_accounts, resolve_fakeids, fetch_all

api_key = "YOUR_API_KEY"

获取单个公众号文章列表

Get article list of a single official account

articles = get_article_list(api_key, "MzkzNDQxOTU2MQ==", limit=2) for art in articles: print(art['title'], art['url'])
articles = get_article_list(api_key, "MzkzNDQxOTU2MQ==", limit=2) for art in articles: print(art['title'], art['url'])

下载并解析文章内容

Download and parse article content

html = download_article_html(api_key, articles[0]['url']) markdown = extract_markdown_from_html(html, title=articles[0]['title']) print(markdown[:500])
html = download_article_html(api_key, articles[0]['url']) markdown = extract_markdown_from_html(html, title=articles[0]['title']) print(markdown[:500])

批量获取多个公众号

Batch fetch multiple official accounts

accounts = load_accounts() fakeids = resolve_fakeids("赛博禅心,饼干哥哥AGI", accounts) summary = fetch_all(api_key, fakeids, limit=2, output_dir="./output") print(f"成功: {summary['success']}, 失败: {summary['fail']}")
undefined
accounts = load_accounts() fakeids = resolve_fakeids("赛博禅心,饼干哥哥AGI", accounts) summary = fetch_all(api_key, fakeids, limit=2, output_dir="./output") print(f"Success: {summary['success']}, Fail: {summary['fail']}")
undefined

输出结构

Output Structure

output/
├── 赛博禅心/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 饼干哥哥AGI/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 老金开源/
│   └── ...
└── summary.json          # 所有文章的元数据汇总
output/
├── 赛博禅心/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 饼干哥哥AGI/
│   ├── 文章标题1.md
│   └── 文章标题2.md
├── 老金开源/
│   └── ...
└── summary.json          # Metadata summary of all articles

summary.json 格式

summary.json Format

json
{
  "fetch_time": "2026-02-23T17:30:00",
  "total_accounts": 3,
  "total_articles": 6,
  "success": 5,
  "fail": 1,
  "accounts": [
    {
      "fakeid": "MzkzNDQxOTU2MQ==",
      "name": "赛博禅心",
      "articles": [
        {
          "title": "文章标题",
          "url": "https://mp.weixin.qq.com/s/...",
          "create_time": "1708689600",
          "saved_path": "output/赛博禅心/文章标题.md",
          "status": "success"
        }
      ]
    }
  ]
}
json
{
  "fetch_time": "2026-02-23T17:30:00",
  "total_accounts": 3,
  "total_articles": 6,
  "success": 5,
  "fail": 1,
  "accounts": [
    {
      "fakeid": "MzkzNDQxOTU2MQ==",
      "name": "赛博禅心",
      "articles": [
        {
          "title": "文章标题",
          "url": "https://mp.weixin.qq.com/s/...",
          "create_time": "1708689600",
          "saved_path": "output/赛博禅心/文章标题.md",
          "status": "success"
        }
      ]
    }
  ]
}

API 接口说明

API Interface Description

文章列表接口

Article List Interface

GET https://down.mptext.top/api/public/v1/article?fakeid={URL_ENCODED_FAKEID}&limit={N}
参数说明
fakeid
公众号的 fakeid,需 URL 编码(
==
%3D%3D
limit
返回文章数量
请求头:
X-Auth-Key: {YOUR_API_KEY}
响应示例:
json
[
  {
    "title": "文章标题",
    "url": "https://mp.weixin.qq.com/s/xxxxx",
    "create_time": 1708689600
  }
]
GET https://down.mptext.top/api/public/v1/article?fakeid={URL_ENCODED_FAKEID}&limit={N}
ParameterDescription
fakeid
fakeid of the official account, requires URL encoding (
==
%3D%3D
)
limit
Number of articles returned
Request Header:
X-Auth-Key: {YOUR_API_KEY}
Response Example:
json
[
  {
    "title": "文章标题",
    "url": "https://mp.weixin.qq.com/s/xxxxx",
    "create_time": 1708689600
  }
]

文章下载接口

Article Download Interface

GET https://down.mptext.top/api/public/v1/download?url={URL_ENCODED_ARTICLE_URL}&type=html
参数说明
url
微信文章 URL,需 URL 编码
type
固定为
html
(API 返回 HTML 格式)
请求头:
X-Auth-Key: {YOUR_API_KEY}
响应: 完整的微信文章 HTML 页面。
GET https://down.mptext.top/api/public/v1/download?url={URL_ENCODED_ARTICLE_URL}&type=html
ParameterDescription
url
WeChat article URL, requires URL encoding
type
Fixed as
html
(API returns HTML format)
Request Header:
X-Auth-Key: {YOUR_API_KEY}
Response: Complete WeChat article HTML page.

HTML 正文解析规则

HTML Body Parsing Rules

从下载的 HTML 中提取正文内容:
  1. 定位
    id="js_content"
    的 div 元素
  2. 移除
    script
    style
    noscript
    标签内容
  3. 提取文本内容,保留段落换行
  4. 使用
    html2text
    转换为 Markdown(如已安装)
Extract body content from the downloaded HTML:
  1. Locate the div element with
    id="js_content"
  2. Remove content of
    script
    ,
    style
    ,
    noscript
    tags
  3. Extract text content, retain paragraph line breaks
  4. Use
    html2text
    to convert to Markdown (if installed)

预置公众号列表

Pre-configured Official Account List

序号公众号名称分类FakeID
1饼干哥哥AGIAI编程
MjM5NDI4MTY3NA==
2赛博禅心AI前沿
MzkzNDQxOTU2MQ==
3可怜的小互AI技术
MzkzMTcyMTgxNg==
4宝玉的工程技术分享技术翻译
Mzk1NzgxMjQ0OA==
5苍何AI实战
Mzg3MTk3NzYzNw==
6老金开源Claude Code
MzI0NzU2MDgyNA==
7玩转AI工具AI工具
MzU4NTE1Mjg4MA==
8袋鼠帝AI客栈AI实战
MzkwMzE4NjU5NA==
可通过
--accounts-file
参数指定自定义的公众号列表 JSON 文件来扩展。
No.Official Account NameCategoryFakeID
1饼干哥哥AGIAI Programming
MjM5NDI4MTY3NA==
2赛博禅心AI Frontier
MzkzNDQxOTU2MQ==
3可怜的小互AI Technology
MzkzMTcyMTgxNg==
4宝玉的工程技术分享Technical Translation
Mzk1NzgxMjQ0OA==
5苍何AI Practice
Mzg3MTk3NzYzNw==
6老金开源Claude Code
MzI0NzU2MDgyNA==
7玩转AI工具AI Tools
MzU4NTE1Mjg4MA==
8袋鼠帝AI客栈AI Practice
MzkwMzE4NjU5NA==
You can specify a custom official account list JSON file via the
--accounts-file
parameter for expansion.

注意事项

Notes

  • API Key 安全: 请勿将 API Key 硬编码到代码中,建议通过环境变量或命令行参数传入。
  • 请求频率: 默认间隔 1 秒,如遇 429 错误请增大
    --interval
    值。
  • HTML 解析: 下载接口返回完整 HTML 页面,脚本自动从
    #js_content
    区域提取正文。
  • 依赖降级: 未安装
    beautifulsoup4
    html2text
    时,使用内置 HTMLParser 提取纯文本。
  • 文件命名: 输出文件以文章标题命名,自动去除特殊字符,长度截断为 80 字符。
  • API Key Security: Do not hardcode the API Key into the code, it is recommended to pass it through environment variables or command line parameters.
  • Request Frequency: The default interval is 1 second. If you encounter a 429 error, please increase the
    --interval
    value.
  • HTML Parsing: The download interface returns a complete HTML page, and the script automatically extracts the body from the
    #js_content
    area.
  • Dependency Degradation: When
    beautifulsoup4
    and
    html2text
    are not installed, use the built-in HTMLParser to extract plain text.
  • File Naming: Output files are named after the article title, special characters are automatically removed, and the length is truncated to 80 characters.

触发关键词

Trigger Keywords

  • "获取公众号文章"
  • "抓取微信文章"
  • "公众号文章聚合"
  • "批量获取公众号"
  • "下载公众号文章"
  • "微信文章采集"
  • "获取最新公众号文章"
  • "Fetch official account articles"
  • "Crawl WeChat articles"
  • "Official account article aggregation"
  • "Batch fetch official accounts"
  • "Download official account articles"
  • "WeChat article collection"
  • "Fetch latest official account articles"

更新日志

Changelog

v1.0.0 (2026-02-23)

v1.0.0 (2026-02-23)

  • 初始版本
  • 支持通过 mptext.top API 获取公众号文章列表
  • 支持下载文章 HTML 并解析为 Markdown/HTML/Text/JSON
  • 内置 HTMLParser 提取
    #js_content
    正文(零依赖降级方案)
  • 预置 8 个热门 AI 技术公众号 fakeid
  • 支持按公众号名称或 fakeid 获取
  • 支持
    all
    关键字获取所有预置公众号
  • 自动生成 summary.json 汇总元数据
  • Initial version
  • Support fetching official account article list via mptext.top API
  • Support downloading article HTML and parsing into Markdown/HTML/Text/JSON
  • Built-in HTMLParser to extract
    #js_content
    body (zero-dependency degradation solution)
  • Pre-configured fakeids of 8 popular AI tech official accounts
  • Support fetching by official account name or fakeid
  • Support
    all
    keyword to fetch all pre-configured official accounts
  • Automatically generate summary.json to aggregate metadata