WeChat Official Account Article Scraper

Overview

Scrape WeChat Official Account articles using Playwright, run in the background without pop-ups, automatically handle dynamic loading, extract clean article content, and support automatic saving as Markdown files.

Features

✅ Headless Mode Execution: Default background scraping without popping up browser windows
✅ Smart Fallback Mechanism: Automatically switch to headed mode when headless mode fails
✅ Dynamic Content Support: Automatically wait for page loading completion, handle lazy-loaded images
✅ Auto-save as Markdown: Support saving scraping results as formatted Markdown files
✅ Content Cleaning: Remove HTML tags, retain paragraph structure, output plain text
✅ Automatic Retry: Automatically retry 3 times on failure to improve success rate
✅ Error Detection: Identify abnormal pages such as "parameter error"
✅ Cross-platform Support: Fully compatible with Windows, macOS and Linux
✅ Smart Workflow: Automatically call formatting skill when detecting legal content
✅ Image Download: Automatically download all images in articles to local
✅ Smart Image Filtering: Automatically filter small decorative images (such as social media buttons, emojis)
✅ Image Position Preservation: Retain the position of images in the original document
✅ Auto File Naming: Generate file names and resource folders based on article titles

Collaboration with Other Skills

Smart Workflow

This skill focuses on article scraping and maintains generality. The AI will automatically call the

legal-text-format

skill for formatting only when legal-related content is detected.

AI Execution Flow:

text

User Request → wechat-article-fetch scraping → [Judge Content Type]
                                              ↓
                    ┌────────────────────────┴────────────────────────┐
                    ↓                                                 ↓
              Detected Legal Content                               Ordinary Article
                    ↓                                                 ↓
          Automatically call legal-text-format                Save original content to project root directory
                    ↓
          Output to archive/ directory

Legal Content Detection

AI will judge whether it is legal content based on the following features:

Title Keywords: Contains "case", "judgment", "verdict", "regulation", "rule", "Supreme People's Court", "Supreme People's Procuratorate", etc.
Content Features: Contains case numbers, court names, legal article citations, etc.
Structural Features: Conforms to the typical structure of legal cases (basic facts, judgment results, typical significance, etc.)

Default Save Location

No Path Specified: Save to project root directory
Specified Relative Path: Relative to project root directory
Specified Absolute Path: Use the specified full path

Examples:

bash

# Save to project root directory
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx"

# Save to specified directory
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/"

# Save to specified file
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/case.md"

Usage

Call in Claude Code

javascript

// Scrape article (return results only)
const result = await fetchWechatArticle("https://mp.weixin.qq.com/s/xxxxx");

// Scrape article and automatically save as Markdown file
const result = await fetchWechatArticle(
  "https://mp.weixin.qq.com/s/xxxxx",
  3,           // Retry count (optional)
  "./output.md" // Save path (optional)
);

// Return format
{
  title: "Article Title",
  content: "Article main text...",
  url: "Article URL"
}

Command Line Call

bash

# Basic usage (output to console only)
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx"

# Save to specified file
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/my-article.md"

# Save to directory (automatically use article title as file name)
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/"

Output Format

Console Output

text

Title: Article Title

First paragraph of article main text...

Second paragraph of article main text...

Markdown File Format

markdown

# Article Title

> Original URL: https://mp.weixin.qq.com/s/xxxxx
> Scraping Time: 2026-01-21 20:30:00

---

First paragraph of article main text...

![Image Description](Article_Title_assets/image_xxx_0.jpg)

Second paragraph of article main text...

File Structure

When articles contain images, the following file structure will be automatically generated:

Output Directory/
├── Article_Title.md              # Markdown file
└── Article_Title_assets/         # Image resource folder
    ├── image_xxx_0.jpg
    ├── image_xxx_1.jpg
    └── ...

Image Filtering

Smart image filtering is enabled by default, automatically filtering decorative images smaller than 15KB (such as social media buttons, emojis, etc.).

You can modify the filtering configuration in

scripts/fetch.js

javascript

const IMAGE_FILTER_CONFIG = {
  minFileSize: 15 * 1024,  // Minimum file size (bytes)
  enabled: true            // Whether to enable filtering
};

Technical Implementation

Dependency Requirements

Playwright (
```
npx playwright install chromium
```
)
Node.js >= 14.0.0

Scraping Process

Detect and install Playwright (if needed)
Launch Playwright headless browser
Set anti-detection parameters (User-Agent, webdriver hiding, etc.)
Navigate to target URL, wait for network idle
Scroll page to trigger lazy loading
Extract
```
#js_content
```
or
```
.rich_media_content
```
area
Clean up HTML tags, retain paragraph structure
Return title and plain text content
Automatically save as Markdown file if save path is specified
Automatically fall back to headed mode for retry if headless mode fails

Error Handling

Automatically retry 3 times, wait 3 seconds after each failure
Automatically fall back to headed mode if headless mode fails
Detect error pages (parameter error, access exception)
Timeout set to 30 seconds
Special handling for Windows platform (paths, command formats)

Cross-Platform Compatibility

Windows: Automatically detect and use
```
cmd.exe
```
to run npx commands
macOS/Linux: Directly use npx commands
Path Handling: Automatically normalize path separators
File Name Handling: Automatically remove Windows illegal characters

Application Scenarios

Input source for content conversion tools
Article analysis and processing
Automated content scraping
Batch article downloading
Article archiving and local saving
Markdown format conversion
Legal document automatic formatting (when legal content is detected)
Complete saving of articles with images (offline archiving including images)
Image resource management (automatically download and organize images in articles)

Usage Examples

Example 1: Batch Scraping and Saving

javascript

const urls = [
  "https://mp.weixin.qq.com/s/xxxx1",
  "https://mp.weixin.qq.com/s/xxxx2",
  "https://mp.weixin.qq.com/s/xxxx3"
];

for (const url of urls) {
  const result = await fetchWechatArticle(url, 3, "./articles/");
  console.log(`Saved: ${result.title}`);
}

Example 2: Direct Use in Claude Code

text

Please help me scrape this WeChat Official Account article and save it as a Markdown file:
https://mp.weixin.qq.com/s/xxxxx

Notes

⚠️ Only for personal study and research, please comply with website service terms ⚠️ Frequent scraping may lead to rate limiting, it is recommended to control request frequency ⚠️ The copyright of scraped content belongs to the original author ⚠️ Headed mode will pop up a browser window, which may interfere with workflow ⚠️ Windows users need to install Playwright for the first use (will be installed automatically)

fetch-wechat-article

NPX Install

Tags

SKILL.md Content (Chinese)

WeChat Official Account Article Scraper

Overview

Features

Collaboration with Other Skills

Smart Workflow

Legal Content Detection

Default Save Location

Usage

Call in Claude Code

Command Line Call

Output Format

Console Output

Markdown File Format

File Structure

Image Filtering

Technical Implementation

Dependency Requirements

Scraping Process

Error Handling

Cross-Platform Compatibility

Application Scenarios

Usage Examples

Example 1: Batch Scraping and Saving

Example 2: Direct Use in Claude Code

Notes