mdrip
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesemdrip
mdrip
Use this skill when an agent needs local, reusable markdown context from web pages for implementation, debugging, or documentation work.
当Agent需要从网页获取本地可复用的Markdown上下文,用于实现、调试或文档编写工作时,可以使用此技能。
When to use
适用场景
- You need to ingest docs/blog pages into the repository as markdown.
- You want to prefer for Cloudflare Markdown for Agents.
Accept: text/markdown - You still need good results when a site only returns HTML.
- You need raw markdown streamed to stdout for agent runtimes (for example OpenClaw) without writing local snapshot files.
- You need to call mdrip as a library from Node.js or Cloudflare Workers.
- 你需要将文档/博客页面以Markdown形式导入到仓库中
- 你希望优先使用请求Cloudflare Markdown for Agents
Accept: text/markdown - 当网站仅返回HTML时,你仍需要获得理想的转换结果
- 你需要将原始Markdown输出到标准输出流(stdout),供Agent运行时(例如OpenClaw)使用,无需写入本地快照文件
- 你需要从Node.js或Cloudflare Workers中以库的形式调用mdrip
Core workflow
核心工作流程
- Fetch target URLs with markdown content negotiation.
- Detect whether markdown was served ().
content-type: text/markdown - If markdown is not served and fallback is allowed, convert HTML to markdown.
- Save snapshots under .
mdrip/pages/.../index.md - Update source tracking in .
mdrip/sources.json - Return a concise summary of paths, content mode, and failures.
- 通过Markdown内容协商机制抓取目标URL
- 检测是否返回了Markdown格式内容()
content-type: text/markdown - 如果未返回Markdown且允许降级处理,则将HTML转换为Markdown
- 将快照保存至路径下
mdrip/pages/.../index.md - 更新中的源跟踪信息
mdrip/sources.json - 返回包含路径、内容模式及失败信息的简洁摘要
Commands
命令示例
bash
undefinedbash
undefinedfetch one page
抓取单个页面
npx mdrip <url>
npx mdrip <url>
fetch many pages
抓取多个页面
npx mdrip <url1> <url2> <url3>
npx mdrip <url1> <url2> <url3>
strict Cloudflare markdown only (no html fallback)
仅严格使用Cloudflare Markdown(不启用HTML降级)
npx mdrip <url> --no-html-fallback
npx mdrip <url> --no-html-fallback
raw markdown to stdout only (no settings/snapshot writes)
仅将原始Markdown输出到标准输出(不写入配置/快照文件)
npx mdrip <url> --raw
npx mdrip <url> --raw
inspect tracked pages
查看已跟踪的页面
npx mdrip list --json
undefinednpx mdrip list --json
undefinedProgrammatic usage
程序化调用
ts
// Workers/agent runtimes (no filesystem writes)
import { fetchMarkdown, fetchRawMarkdown } from "mdrip";
// Node.js filesystem helpers
import { fetchToStore, fetchManyToStore, listStoredPages } from "mdrip/node";ts
// Workers/Agent运行时(不写入文件系统)
import { fetchMarkdown, fetchRawMarkdown } from "mdrip";
// Node.js文件系统工具
import { fetchToStore, fetchManyToStore, listStoredPages } from "mdrip/node";Guardrails
防护规则
- Prefer official sources and canonical URLs.
- Do not overwrite unrelated files.
- Report whether each result came from Cloudflare markdown or HTML fallback.
- If a fetch fails, include URL, HTTP status/error, and next-step retry guidance.
- 优先使用官方源和规范URL
- 不得覆盖无关文件
- 需标注每个结果来自Cloudflare Markdown还是HTML降级转换
- 如果抓取失败,需包含URL、HTTP状态码/错误信息,以及重试指导建议
References
参考文档
references/workflow.mdreferences/fallback-and-quality.md
references/workflow.mdreferences/fallback-and-quality.md