mdrip

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

mdrip

Use this skill when an agent needs local, reusable markdown context from web pages for implementation, debugging, or documentation work.

当Agent需要从网页获取本地可复用的Markdown上下文，用于实现、调试或文档编写工作时，可以使用此技能。

When to use

适用场景

You need to ingest docs/blog pages into the repository as markdown.
You want to prefer
```
Accept: text/markdown
```
for Cloudflare Markdown for Agents.
You still need good results when a site only returns HTML.
You need raw markdown streamed to stdout for agent runtimes (for example OpenClaw) without writing local snapshot files.
You need to call mdrip as a library from Node.js or Cloudflare Workers.

你需要将文档/博客页面以Markdown形式导入到仓库中
你希望优先使用
```
Accept: text/markdown
```
请求Cloudflare Markdown for Agents
当网站仅返回HTML时，你仍需要获得理想的转换结果
你需要将原始Markdown输出到标准输出流（stdout），供Agent运行时（例如OpenClaw）使用，无需写入本地快照文件
你需要从Node.js或Cloudflare Workers中以库的形式调用mdrip

Core workflow

核心工作流程

Fetch target URLs with markdown content negotiation.
Detect whether markdown was served (
```
content-type: text/markdown
```
).
If markdown is not served and fallback is allowed, convert HTML to markdown.
Save snapshots under
```
mdrip/pages/.../index.md
```
.
Update source tracking in
```
mdrip/sources.json
```
.
Return a concise summary of paths, content mode, and failures.

通过Markdown内容协商机制抓取目标URL
检测是否返回了Markdown格式内容（
```
content-type: text/markdown
```
）
如果未返回Markdown且允许降级处理，则将HTML转换为Markdown
将快照保存至
```
mdrip/pages/.../index.md
```
路径下
更新
```
mdrip/sources.json
```
中的源跟踪信息
返回包含路径、内容模式及失败信息的简洁摘要

Commands

命令示例

bash

undefined

bash

undefined

fetch one page

抓取单个页面

npx mdrip <url>

fetch many pages

抓取多个页面

npx mdrip <url1> <url2> <url3>

strict Cloudflare markdown only (no html fallback)

仅严格使用Cloudflare Markdown（不启用HTML降级）

npx mdrip <url> --no-html-fallback

raw markdown to stdout only (no settings/snapshot writes)

仅将原始Markdown输出到标准输出（不写入配置/快照文件）

npx mdrip <url> --raw

inspect tracked pages

查看已跟踪的页面

npx mdrip list --json

undefined

npx mdrip list --json

undefined

Programmatic usage

程序化调用

// Workers/agent runtimes (no filesystem writes)
import { fetchMarkdown, fetchRawMarkdown } from "mdrip";

// Node.js filesystem helpers
import { fetchToStore, fetchManyToStore, listStoredPages } from "mdrip/node";

// Workers/Agent运行时（不写入文件系统）
import { fetchMarkdown, fetchRawMarkdown } from "mdrip";

// Node.js文件系统工具
import { fetchToStore, fetchManyToStore, listStoredPages } from "mdrip/node";

Guardrails

防护规则

Prefer official sources and canonical URLs.
Do not overwrite unrelated files.
Report whether each result came from Cloudflare markdown or HTML fallback.
If a fetch fails, include URL, HTTP status/error, and next-step retry guidance.

优先使用官方源和规范URL
不得覆盖无关文件
需标注每个结果来自Cloudflare Markdown还是HTML降级转换
如果抓取失败，需包含URL、HTTP状态码/错误信息，以及重试指导建议

References

参考文档

```
references/workflow.md
```
```
references/fallback-and-quality.md
```

```
references/workflow.md
```
```
references/fallback-and-quality.md
```