web-to-markdown
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseweb-to-markdown
web-to-markdown
Convert web pages to clean Markdown by driving a locally installed browser (via ).
web2md通过驱动本地安装的浏览器(借助)将网页转换为整洁的Markdown格式。
web2mdHard trigger gate (must enforce)
严格触发规则(必须执行)
This skill MUST NOT be used unless the user explicitly wrote exactly a phrase like:
use the skill web-to-markdown ...use a skill web-to-markdown ...
If the user did not explicitly request this skill by name, stop and ask them to re-issue the request including: .
use the skill web-to-markdown除非用户明确写下如下确切表述,否则不得使用此技能:
use the skill web-to-markdown ...use a skill web-to-markdown ...
如果用户没有明确通过名称请求此技能,请停止操作并要求他们重新发起包含的请求。
use the skill web-to-markdownWhat this skill does
此技能的功能
- Handles JS-rendered pages (Puppeteer → user Chrome).
- Works best with Chromium-family browsers (Chrome/Chromium/Brave/Edge) via .
puppeteer-core - Extracts main content (Readability).
- Converts to Markdown (Turndown) with cleaned links and optional YAML frontmatter.
- 支持JS渲染的页面(Puppeteer → 用户Chrome浏览器)。
- 通过在Chromium系列浏览器(Chrome/Chromium/Brave/Edge)上表现最佳。
puppeteer-core - 提取主要内容(借助Readability)。
- 将内容转换为Markdown格式(借助Turndown),包含清理后的链接和可选的YAML前置元数据。
Non-goals
不支持的功能
- Do not use Playwright or other browser automation stacks; the mechanism is .
web2md
- 不使用Playwright或其他浏览器自动化工具栈;仅使用机制。
web2md
Inputs you should collect (ask only if missing)
需要收集的输入信息(仅在缺失时询问)
- (or a list of URLs)
url - Output preference:
- Print to stdout (), OR
--print - Save to a file (), OR
--out ./file.md - Save to a directory (to auto-name by page title)
--out ./some-dir/
- Print to stdout (
- Optional rendering controls for tricky pages:
- (if Chrome auto-detection fails)
--chrome-path <path> - (show Chrome and pause so the user can complete human checks/login, then press Enter)
--interactive --wait-until load|domcontentloaded|networkidle0|networkidle2--wait-for '<css selector>'--wait-ms <milliseconds>- (debug)
--headful - (sometimes required in containers/CI)
--no-sandbox - (login/session; use a dedicated profile directory)
--user-data-dir <dir>
- (或URL列表)
url - 输出偏好:
- 打印到标准输出(),或
--print - 保存到文件(),或
--out ./file.md - 保存到目录(,按页面标题自动命名)
--out ./some-dir/
- 打印到标准输出(
- 针对复杂页面的可选渲染控制参数:
- (当Chrome自动检测失败时使用)
--chrome-path <path> - (显示Chrome浏览器并暂停,以便用户完成人工验证/登录,然后按Enter继续)
--interactive --wait-until load|domcontentloaded|networkidle0|networkidle2--wait-for '<css selector>'--wait-ms <milliseconds>- (调试模式)
--headful - (有时在容器/CI环境中需要)
--no-sandbox - (登录/会话;使用专用配置文件目录)
--user-data-dir <dir>
Workflow
工作流程
- Confirm the user explicitly invoked the skill ().
use the skill web-to-markdown - Validate URL(s) start with or
http://.https:// - Ensure is installed:
web2md- Run:
command -v web2md - If missing, instruct the user to install it:
- If available via npm:
npm install -g web2md - If from source: Clone the repository, then run
npm install && npm run build && npm link
- If available via npm:
- Run:
- Convert:
- Single URL → file:
web2md '<url>' --out ./page.md
- Single URL → auto-named file in directory:
mkdir -p ./out && web2md '<url>' --out ./out/
- Human verification / login walls (interactive):
mkdir -p ./out && web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/- Then: complete the check in the browser window and press Enter in the terminal to continue.
- Print to stdout:
web2md '<url>' --print
- Multiple URLs (batch):
- Create output dir (e.g. ) then run one
./out/command per URL usingweb2md--out ./out/
- Create output dir (e.g.
- Single URL → file:
- Validate output:
- If writing files, verify they exist and are non-empty (e.g. and
ls -la <path>).wc -c <path>
- If writing files, verify they exist and are non-empty (e.g.
- Return:
- The saved file path(s), or the Markdown (stdout mode).
- 确认用户明确调用了此技能(即使用表述)。
use the skill web-to-markdown - 验证URL是否以或
http://开头。https:// - 确保已安装:
web2md- 运行:
command -v web2md - 如果未安装,指导用户进行安装:
- 若可通过npm获取:
npm install -g web2md - 若从源码安装:克隆仓库,然后运行
npm install && npm run build && npm link
- 若可通过npm获取:
- 运行:
- 执行转换:
- 单个URL → 文件:
web2md '<url>' --out ./page.md
- 单个URL → 目录下自动命名的文件:
mkdir -p ./out && web2md '<url>' --out ./out/
- 人工验证/登录墙(交互模式):
mkdir -p ./out && web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/- 然后:在浏览器窗口完成验证,再在终端按Enter继续。
- 打印到标准输出:
web2md '<url>' --print
- 多个URL(批量处理):
- 创建输出目录(如),然后对每个URL分别运行一次带
./out/参数的--out ./out/命令web2md
- 创建输出目录(如
- 单个URL → 文件:
- 验证输出:
- 如果是写入文件,验证文件是否存在且非空(例如使用和
ls -la <path>命令)。wc -c <path>
- 如果是写入文件,验证文件是否存在且非空(例如使用
- 返回结果:
- 保存的文件路径,或Markdown内容(标准输出模式)。
Defaults (recommended)
默认配置(推荐)
- For most pages:
--wait-until networkidle2 - For heavy apps: start with , then add
--wait-until domcontentloaded --wait-ms 2000(or another stable selector) if needed.--wait-for 'main'
- 对于大多数页面:
--wait-until networkidle2 - 对于大型应用:先使用,如果需要再添加
--wait-until domcontentloaded --wait-ms 2000(或其他稳定选择器)。--wait-for 'main'