wechat-article-fetcher

Original：🇨🇳 Chinese

Translated

1 scriptsChecked / no sensitive code detected

Fetch and parse WeChat Official Account articles. Extract titles, authors, official account names, main content, images and metadata from WeChat article links. It is used when users provide WeChat article links (mp.weixin.qq.com/s/...) and want to read, extract, download or convert article content. Applicable scenarios include obtaining/downloading WeChat articles, extracting text or metadata from WeChat articles, converting WeChat articles to Markdown, and saving WeChat articles along with images locally. Keywords: WeChat Official Account, article acquisition, article scraping, article download.

7installs

Sourcewwwzhouhui/skills_collection

Added on2026-02-28

NPX Install

npx skill4agent add wwwzhouhui/skills_collection wechat-article-fetcher

SKILL.md Content (Chinese)

View Translation Comparison →

WeChat Official Account Article Fetcher

Fetch, parse and save WeChat Official Account articles, support single and batch download, metadata extraction, image download and Markdown conversion.

Quick Start

Fetch a single article:

bash

python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx"

Fetch multiple articles in batch (separated by spaces):

bash

python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output

Fetch multiple articles in batch (separated by commas):

bash

python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output

Output metadata only (no files saved):

bash

python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx" --json

Dependency Installation

bash

pip install beautifulsoup4 html2text requests

Feature Description

1. Fetch articles and save to local

bash

python scripts/fetch_wechat_article.py "<url>" --output-dir ./output

Output directory structure:

output/<公众号名称>/<日期>_<标题>/
├── index.html    # Formatted standalone HTML file
├── article.md    # Markdown version
├── meta.json     # Article metadata
└── images/       # Downloaded images

2. Extract metadata only

bash

python scripts/fetch_wechat_article.py "<url>" --json

The returned JSON contains:

title

,

author

,

account_nickname

(WeChat Official Account name),

description

(abstract),

create_time

(publish time),

content_text

(main body text),

content_markdown

(Markdown content),

cover_image

(cover image),

source_url

(original article link).

3. Batch download multiple articles

Multiple links separated by spaces:

bash

python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output

Multiple links separated by commas:

bash

python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output

Custom download interval (default 3 seconds, to avoid triggering anti-crawling mechanism):

bash

python scripts/fetch_wechat_article.py "url1" "url2" --interval 5

Articles from the same Official Account are automatically categorized into the same directory.

4. Skip image download

bash

python scripts/fetch_wechat_article.py "<url>" --no-images

4. Skip image download

bash

python scripts/fetch_wechat_article.py "<url>" --no-images

5. Call as a Python library

python

from scripts.fetch_wechat_article import fetch_article, batch_fetch

# Fetch and save single article
result = fetch_article("https://mp.weixin.qq.com/s/xxxxx", output_dir="./output")
print(result['title'], result['path'])

# Only fetch metadata for single article
meta = fetch_article("https://mp.weixin.qq.com/s/xxxxx", json_only=True)
print(meta['title'])
print(meta['content_text'][:200])

# Batch fetch
urls = ["https://mp.weixin.qq.com/s/aaa", "https://mp.weixin.qq.com/s/bbb"]
stats = batch_fetch(urls, output_dir="./output", interval=3.0)
print(f"Success: {stats['success']} articles, Fail: {stats['fail']} articles")

Main function parameters:

```
url
```
: Article link (supports both short links and long links)
```
output_dir
```
: Save directory (default:
```
./wechat_articles
```
)
```
download_img
```
: Whether to download images (default:
```
True
```
)
```
to_markdown
```
: Whether to convert to Markdown (default:
```
True
```
)
```
json_only
```
: Only return metadata dictionary, no files saved

Extra parameters for

batch_fetch

:

```
urls
```
: List of article links
```
interval
```
: Download interval in seconds between each article (default:
```
3.0
```
)

Notes

Give priority to short links (
```
/s/xxxxx
```
) —— Long links with
```
__biz
```
parameter may trigger CAPTCHA.
The default interval for batch download is 3 seconds, which can be adjusted via
```
--interval
```
to avoid triggering WeChat's anti-crawling mechanism.
Automatically use WeChat mobile User-Agent to bypass access restrictions.
WeChat images use
```
data-src
```
attribute (not
```
src
```
) due to lazy loading.
```
Referer: https://mp.weixin.qq.com/
```
request header is required for image download.
For details of HTML structure, please refer to references/wechat_html_structure.md.

wechat-article-fetcher

NPX Install

Tags

SKILL.md Content (Chinese)

WeChat Official Account Article Fetcher

Quick Start

Dependency Installation

Feature Description

1. Fetch articles and save to local

2. Extract metadata only

3. Batch download multiple articles

4. Skip image download

4. Skip image download

5. Call as a Python library

Notes