Loading...
Loading...
Compare original and translation side by side
cd skills/web-to-markdown && pnpm installcd skills/web-to-markdown && pnpm installdocs/web-captures/YYYYMMDD_HHMMSS.mddocs/web-captures/YYYYMMDD_HHMMSS.mdcd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts <url1> [url2] [url3] ...cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts <url1> [url2] [url3] ...cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts https://example.com/docscd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts https://example.com/docs
**Multiple URLs (Batch):**
```bash
cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts \
https://example.com/guide \
https://example.com/api \
https://example.com/faq
**多个URL(批量处理):**
```bash
cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts \
https://example.com/guide \
https://example.com/api \
https://example.com/faq
**From project root:**
```bash
pnpm --filter @skills/web-to-markdown tsx scripts/scrape-and-convert.ts <urls...>
**从项目根目录执行:**
```bash
pnpm --filter @skills/web-to-markdown tsx scripts/scrape-and-convert.ts <urls...>undefinedundefinedundefinedundefinedBrowserErrorFileErrorHtmlConversionErrorskills/web-to-markdown/
├── SKILL.md # This file (workflow instructions)
├── package.json # pnpm workspace config
├── tsconfig.json # TypeScript config
├── scripts/
│ ├── scrape-and-convert.ts # Main CLI (Playwright + Turndown)
│ ├── html-to-markdown.ts # Pure conversion function (Turndown wrapper)
│ └── convert-and-append.ts # Legacy CLI (deprecated, kept for reference)
└── tests/
└── html-to-markdown.test.ts # Unit testsBrowserErrorFileErrorHtmlConversionErrorskills/web-to-markdown/
├── SKILL.md # 当前文件(工作流说明)
├── package.json # pnpm工作区配置
├── tsconfig.json # TypeScript配置
├── scripts/
│ ├── scrape-and-convert.ts # 主CLI入口(Playwright + Turndown实现)
│ ├── html-to-markdown.ts # 纯转换函数(Turndown封装)
│ └── convert-and-append.ts # 旧版CLI(已废弃,仅留作参考)
└── tests/
└── html-to-markdown.test.ts # 单元测试pnpm exec playwright install chromiumpnpm exec playwright install chromiumDEFAULT_CONFIGscripts/scrape-and-convert.tsconst DEFAULT_CONFIG = {
outputDir: 'docs/web-captures',
timeout: 30000, // milliseconds
};scripts/scrape-and-convert.tsDEFAULT_CONFIGconst DEFAULT_CONFIG = {
outputDir: 'docs/web-captures',
timeout: 30000, // 单位:毫秒
};| Feature | web-to-markdown | scratchpad-fetch | Jina AI Reader |
|---|---|---|---|
| Transport | Playwright (headless) | curl (HTTP) | Cloud API |
| JavaScript | ✅ Full rendering | ❌ No | ✅ Server-side |
| Conversion | ✅ Turndown | ❌ Raw HTML | ✅ LLM-powered |
| Self-hosted | ✅ Yes | ✅ Yes | ❌ Cloud only |
| Setup | pnpm install | None | API key |
| Speed | Medium (2-5s/page) | Fast (<1s) | Fast (~2s) |
| Visible browser | ❌ No (headless) | N/A | N/A |
| 特性 | web-to-markdown | scratchpad-fetch | Jina AI Reader |
|---|---|---|---|
| 传输方式 | Playwright(无头模式) | curl(HTTP协议) | 云端API |
| JavaScript支持 | ✅ 完整渲染 | ❌ 不支持 | ✅ 服务端渲染 |
| 转换能力 | ✅ Turndown转换 | ❌ 仅返回原始HTML | ✅ LLM驱动转换 |
| 自托管 | ✅ 支持 | ✅ 支持 | ❌ 仅云端可用 |
| 安装要求 | pnpm install | 无 | 需要API密钥 |
| 速度 | 中等(2-5秒/页) | 快(<1秒) | 快(约2秒) |
| 可见浏览器 | ❌ 无(无头模式) | 不适用 | 不适用 |
cd skills/web-to-markdown
pnpm exec playwright install chromiumDEFAULT_CONFIGcd skills/web-to-markdown
pnpm exec playwright install chromiumDEFAULT_CONFIG