wechat-article-fetcher

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

微信公众号文章抓取助手

WeChat Official Account Article Crawler Assistant

本技能专门用于突破微信公众号文章的访问限制,将其转化为结构良好的 Markdown 文件,并自动处理图片资源的本地化。
This skill is specifically designed to bypass access restrictions of WeChat Official Account articles, convert them into well-structured Markdown files, and automatically handle localizing image resources.

使用场景

Usage Scenarios

  • 当用户提供以
    https://mp.weixin.qq.com/s/
    开头的链接时。
  • 需要将公众号内容保存为本地 Markdown 文档时。
  • 需要对公众号长文进行结构化总结或提取图片资源时。
  • When users provide links starting with
    https://mp.weixin.qq.com/s/
    .
  • When you need to save Official Account content as local Markdown documents.
  • When you need to create structured summaries of long Official Account articles or extract image resources.

核心功能

Core Features

  1. 全文抓取: 自动处理标准文章流和特殊的“图片页”格式。
  2. 图文混排: 保持原有的段落和图片位置关系。
  3. 图片本地化: 自动下载远程图片到
    images/
    目录,并更新 Markdown 引用路径。
  4. 结构化输出: 自动识别标题层级(H1-H6),生成易于阅读的文档。
  5. 概要生成: 自动提取文章核心观点、关键洞见及作者信息。
  6. flomo 集成: 支持将生成的文章概要推送到 flomo 笔记平台。
  1. Full-text Crawling: Automatically handles standard article streams and special "image page" formats.
  2. Mixed Text & Image Layout: Maintains the original paragraph and image position relationships.
  3. Local Image Storage: Automatically downloads remote images to the
    images/
    directory and updates the Markdown reference paths.
  4. Structured Output: Automatically identifies heading levels (H1-H6) to generate easy-to-read documents.
  5. Summary Generation: Automatically extracts the article's core viewpoints, key insights, and author information.
  6. flomo Integration: Supports pushing generated article summaries to the flomo note-taking platform.

目录结构说明

Directory Structure Description

  • bin/main.py
    : 主执行脚本,支持可选的 flomo API URL 参数。
  • utils/downloader.py
    : 负责 HTML 和图片下载。
  • utils/parser.py
    : 负责内容解析、清洗和结构化。
  • utils/flomo.py
    : 负责向 flomo 发送 POST 请求。
  • images/
    : (运行后生成) 存放本地化后的图片资源。
  • bin/main.py
    : Main execution script, supports optional flomo API URL parameter.
  • utils/downloader.py
    : Responsible for HTML and image downloading.
  • utils/parser.py
    : Responsible for content parsing, cleaning, and structuring.
  • utils/flomo.py
    : Responsible for sending POST requests to flomo.
  • images/
    : (Generated after running) Stores localized image resources.

使用示例

Usage Examples

  1. 仅本地抓取: 直接提供公众号链接。
  2. 推送至 flomo: 提供链接的同时,告知 flomo 秘钥(API URL),脚本将自动完成推送。
  1. Local Crawl Only: Directly provide the Official Account link.
  2. Push to flomo: Provide the link along with the flomo secret key (API URL), and the script will automatically complete the push.