Loading...
Loading...
Batch-process web pages via headless Playwright browser, extract HTML, convert to markdown using Turndown, and save to timestamped scratchpad file. Use when user asks to "capture these pages as markdown", "save web content", "fetch and convert webpages", or needs clean markdown from HTML. All URLs from one prompt → single file at docs/web-captures/<timestamp>.md.
npx skill4agent add otrebu/agents web-to-markdowncd skills/web-to-markdown && pnpm installdocs/web-captures/YYYYMMDD_HHMMSS.mdcd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts <url1> [url2] [url3] ...cd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts https://example.com/docs
# Output: docs/web-captures/20251103_143052.mdcd skills/web-to-markdown
pnpm tsx scripts/scrape-and-convert.ts \
https://example.com/guide \
https://example.com/api \
https://example.com/faq
# Output: docs/web-captures/20251103_143052.md (all 3 pages)pnpm --filter @skills/web-to-markdown tsx scripts/scrape-and-convert.ts <urls...># Web Captures - YYYY-MM-DD HH:MM:SS
Generated: YYYYMMDD_HHMMSS
URLs: N
---
## 📄 https://example.com/page1
[Converted markdown content...]
---
## 📄 https://example.com/page2
[Converted markdown content...]
---BrowserErrorFileErrorHtmlConversionErrorskills/web-to-markdown/
├── SKILL.md # This file (workflow instructions)
├── package.json # pnpm workspace config
├── tsconfig.json # TypeScript config
├── scripts/
│ ├── scrape-and-convert.ts # Main CLI (Playwright + Turndown)
│ ├── html-to-markdown.ts # Pure conversion function (Turndown wrapper)
│ └── convert-and-append.ts # Legacy CLI (deprecated, kept for reference)
└── tests/
└── html-to-markdown.test.ts # Unit testspnpm exec playwright install chromiumDEFAULT_CONFIGscripts/scrape-and-convert.tsconst DEFAULT_CONFIG = {
outputDir: 'docs/web-captures',
timeout: 30000, // milliseconds
};| Feature | web-to-markdown | scratchpad-fetch | Jina AI Reader |
|---|---|---|---|
| Transport | Playwright (headless) | curl (HTTP) | Cloud API |
| JavaScript | ✅ Full rendering | ❌ No | ✅ Server-side |
| Conversion | ✅ Turndown | ❌ Raw HTML | ✅ LLM-powered |
| Self-hosted | ✅ Yes | ✅ Yes | ❌ Cloud only |
| Setup | pnpm install | None | API key |
| Speed | Medium (2-5s/page) | Fast (<1s) | Fast (~2s) |
| Visible browser | ❌ No (headless) | N/A | N/A |
cd skills/web-to-markdown
pnpm exec playwright install chromiumDEFAULT_CONFIG