mdrip

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

mdrip

mdrip

Use this skill when an agent needs local, reusable markdown context from web pages for implementation, debugging, or documentation work.
当Agent需要从网页获取本地可复用的Markdown上下文,用于实现、调试或文档编写工作时,可以使用此技能。

When to use

适用场景

  • You need to ingest docs/blog pages into the repository as markdown.
  • You want to prefer
    Accept: text/markdown
    for Cloudflare Markdown for Agents.
  • You still need good results when a site only returns HTML.
  • You need raw markdown streamed to stdout for agent runtimes (for example OpenClaw) without writing local snapshot files.
  • You need to call mdrip as a library from Node.js or Cloudflare Workers.
  • 你需要将文档/博客页面以Markdown形式导入到仓库中
  • 你希望优先使用
    Accept: text/markdown
    请求Cloudflare Markdown for Agents
  • 当网站仅返回HTML时,你仍需要获得理想的转换结果
  • 你需要将原始Markdown输出到标准输出流(stdout),供Agent运行时(例如OpenClaw)使用,无需写入本地快照文件
  • 你需要从Node.js或Cloudflare Workers中以库的形式调用mdrip

Core workflow

核心工作流程

  1. Fetch target URLs with markdown content negotiation.
  2. Detect whether markdown was served (
    content-type: text/markdown
    ).
  3. If markdown is not served and fallback is allowed, convert HTML to markdown.
  4. Save snapshots under
    mdrip/pages/.../index.md
    .
  5. Update source tracking in
    mdrip/sources.json
    .
  6. Return a concise summary of paths, content mode, and failures.
  1. 通过Markdown内容协商机制抓取目标URL
  2. 检测是否返回了Markdown格式内容(
    content-type: text/markdown
  3. 如果未返回Markdown且允许降级处理,则将HTML转换为Markdown
  4. 将快照保存至
    mdrip/pages/.../index.md
    路径下
  5. 更新
    mdrip/sources.json
    中的源跟踪信息
  6. 返回包含路径、内容模式及失败信息的简洁摘要

Commands

命令示例

bash
undefined
bash
undefined

fetch one page

抓取单个页面

npx mdrip <url>
npx mdrip <url>

fetch many pages

抓取多个页面

npx mdrip <url1> <url2> <url3>
npx mdrip <url1> <url2> <url3>

strict Cloudflare markdown only (no html fallback)

仅严格使用Cloudflare Markdown(不启用HTML降级)

npx mdrip <url> --no-html-fallback
npx mdrip <url> --no-html-fallback

raw markdown to stdout only (no settings/snapshot writes)

仅将原始Markdown输出到标准输出(不写入配置/快照文件)

npx mdrip <url> --raw
npx mdrip <url> --raw

inspect tracked pages

查看已跟踪的页面

npx mdrip list --json
undefined
npx mdrip list --json
undefined

Programmatic usage

程序化调用

ts
// Workers/agent runtimes (no filesystem writes)
import { fetchMarkdown, fetchRawMarkdown } from "mdrip";

// Node.js filesystem helpers
import { fetchToStore, fetchManyToStore, listStoredPages } from "mdrip/node";
ts
// Workers/Agent运行时(不写入文件系统)
import { fetchMarkdown, fetchRawMarkdown } from "mdrip";

// Node.js文件系统工具
import { fetchToStore, fetchManyToStore, listStoredPages } from "mdrip/node";

Guardrails

防护规则

  • Prefer official sources and canonical URLs.
  • Do not overwrite unrelated files.
  • Report whether each result came from Cloudflare markdown or HTML fallback.
  • If a fetch fails, include URL, HTTP status/error, and next-step retry guidance.
  • 优先使用官方源和规范URL
  • 不得覆盖无关文件
  • 需标注每个结果来自Cloudflare Markdown还是HTML降级转换
  • 如果抓取失败,需包含URL、HTTP状态码/错误信息,以及重试指导建议

References

参考文档

  • references/workflow.md
  • references/fallback-and-quality.md
  • references/workflow.md
  • references/fallback-and-quality.md