WeChat Official Account Article Scraper

Overview

Use Playwright to scrape WeChat Official Account articles, run in the background without pop-ups, automatically handle dynamic loading, extract clean article content, and support automatic saving as Markdown files.

Features

✅ Headless mode operation: By default, scrape in the background without popping up a browser window
✅ Smart fallback mechanism: Automatically switch to headed mode when headless mode fails
✅ Dynamic content support: Automatically wait for page loading completion and handle lazy-loaded images
✅ Auto-save as Markdown: Support saving scraping results as formatted Markdown files
✅ Content cleaning: Remove HTML tags, retain paragraph structure, and output plain text
✅ Auto-retry: Automatically retry 3 times on failure to improve success rate
✅ Error detection: Identify abnormal pages such as "parameter error"
✅ Cross-platform support: Fully compatible with Windows, macOS, and Linux
✅ Smart workflow: Automatically call formatting skills when detecting legal content
✅ Image downloading: Automatically download all images in the article to local storage
✅ Smart image filtering: Automatically filter small decorative images (such as social media buttons, emojis)
✅ Image position retention: Keep images in their original positions in the document
✅ Auto file naming: Generate file names and resource folders based on article titles

Collaboration with Other Skills

Smart Workflow

This skill focuses on article scraping and maintains universality. The AI will automatically call the

legal-text-format

skill for formatting only when legal-related content is detected.

AI Execution Flow:

text

User Request → wechat-article-fetch scraping → [Judge Content Type]
                                              ↓
                    ┌────────────────────────┴────────────────────────┐
                    ↓                                                 ↓
              Legal Content Detected                            Regular Articles
                    ↓                                                 ↓
          Auto-call legal-text-format                      Save original content to project root directory
                    ↓
          Output to archive/ directory

Legal Content Detection

AI will determine if content is legal-related based on the following features:

Title keywords: Contains terms like "case", "judgment", "verdict", "regulation", "rule", "Supreme People's Court", "Supreme People's Procuratorate", etc.
Content features: Includes case numbers, court names, citations of legal provisions, etc.
Structural features: Conforms to the typical structure of legal cases (basic facts, judgment results, typical significance, etc.)

Default Save Locations

No path specified: Save to project root directory
Specified relative path: Relative to the project root directory
Specified absolute path: Use the specified full path

Examples:

bash

# Save to project root directory
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx"

# Save to specified directory
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/"

# Save to specified file
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/case.md"

Usage

Call in Claude Code

javascript

// Scrape article (return results only)
const result = await fetchWechatArticle("https://mp.weixin.qq.com/s/xxxxx");

// Scrape article and auto-save as Markdown file
const result = await fetchWechatArticle(
  "https://mp.weixin.qq.com/s/xxxxx",
  3,           // Retry count (optional)
  "./output.md" // Save path (optional)
);

// Return format
{
  title: "Article Title",
  content: "Article main text...",
  url: "Article URL"
}

Command Line Call

bash

# Basic usage (output to console only)
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx"

# Save to specified file
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/my-article.md"

# Save to directory (automatically use article title as file name)
node scripts/fetch.js "https://mp.weixin.qq.com/s/xxxxx" "./articles/"

Output Format

Console Output

text

Title: Article Title

First paragraph of article main text...

Second paragraph of article main text...

Markdown File Format

markdown

# Article Title

> Original URL: https://mp.weixin.qq.com/s/xxxxx
> Scraped Time: 2026-01-21 20:30:00

---

First paragraph of article main text...

![Image Description](Article_Title_Assets/image_xxx_0.jpg)

Second paragraph of article main text...

File Structure

When an article contains images, the following file structure will be automatically generated:

Output Directory/
├── Article_Title.md              # Markdown file
└── Article_Title_Assets/         # Image resource folder
    ├── image_xxx_0.jpg
    ├── image_xxx_1.jpg
    └── ...

Image Filtering

Smart image filtering is enabled by default, which automatically filters decorative images smaller than 15KB (such as social media buttons, emojis, etc.).

You can modify the filtering configuration in

scripts/fetch.js

javascript

const IMAGE_FILTER_CONFIG = {
  minFileSize: 15 * 1024,  // Minimum file size (bytes)
  enabled: true            // Whether to enable filtering
};

Technical Implementation

Dependency Requirements

Playwright (
```
npx playwright install chromium
```
)
Node.js >= 14.0.0

Scraping Process

Detect and install Playwright (if needed)
Launch Playwright headless browser
Set anti-detection parameters (User-Agent, webdriver hiding, etc.)
Navigate to the target URL and wait for network idle
Scroll the page to trigger lazy loading
Extract the
```
#js_content
```
or
```
.rich_media_content
```
area
Clean up HTML tags and retain paragraph structure
Return title and plain text content
Automatically save as Markdown file if a save path is specified
Automatically fall back to headed mode for retry if headless mode fails

Error Handling

Automatically retry 3 times, waiting 3 seconds after each failure
Automatically fall back to headed mode if headless mode fails
Detect error pages (parameter errors, access exceptions)
Timeout set to 30 seconds
Special handling for Windows platform (paths, command formats)

Cross-Platform Compatibility

Windows: Automatically detect and use
```
cmd.exe
```
to run npx commands
macOS/Linux: Directly use npx commands
Path handling: Automatically normalize path separators
File name handling: Automatically remove illegal characters for Windows

Applicable Scenarios

Input source for content conversion tools
Article analysis and processing
Automated content scraping
Batch article downloading
Article archiving and local saving
Markdown format conversion
Automatic formatting of legal documents (when legal content is detected)
Complete saving of articles with images (offline archiving including images)
Image resource management (automatically download and organize images in articles)

Usage Examples

Example 1: Batch Scraping and Saving

javascript

const urls = [
  "https://mp.weixin.qq.com/s/xxxx1",
  "https://mp.weixin.qq.com/s/xxxx2",
  "https://mp.weixin.qq.com/s/xxxx3"
];

for (const url of urls) {
  const result = await fetchWechatArticle(url, 3, "./articles/");
  console.log(`Saved: ${result.title}`);
}

Example 2: Direct Use in Claude Code

text

Please help me scrape this WeChat Official Account article and save it as a Markdown file:
https://mp.weixin.qq.com/s/xxxxx

Notes

⚠️ For personal study and research only, please comply with website service terms ⚠️ Frequent scraping may lead to rate limiting, please control request frequency ⚠️ The copyright of scraped content belongs to the original author ⚠️ Headed mode will pop up a browser window, which may interfere with workflow ⚠️ Windows users need to install Playwright for the first use (installation will be automatic)

wechat-article-fetch

NPX Install

Tags

SKILL.md Content (Chinese)

WeChat Official Account Article Scraper

Overview

Features

Collaboration with Other Skills

Smart Workflow

Legal Content Detection

Default Save Locations

Usage

Call in Claude Code

Command Line Call

Output Format

Console Output

Markdown File Format

File Structure

Image Filtering

Technical Implementation

Dependency Requirements

Scraping Process

Error Handling

Cross-Platform Compatibility

Applicable Scenarios

Usage Examples

Example 1: Batch Scraping and Saving

Example 2: Direct Use in Claude Code

Notes