Loading...
Loading...
Scrape documentation websites into local markdown files for AI context. Takes a base URL and crawls the documentation, storing results in ./docs (or custom path). Uses crawl4ai with BFS deep crawling.
npx skill4agent add bjesuiter/skills jb-docs-scrapercrawl4ai# Scrape any documentation URL
uv run --with crawl4ai python ./references/scrape_docs.py <URL>
# Examples
uv run --with crawl4ai python ./references/scrape_docs.py https://mediasoup.org/documentation/v3/
uv run --with crawl4ai python ./references/scrape_docs.py https://docs.rombo.co/tailwind./docs/<auto-detected-name>/uv run --with crawl4ai playwright installuv run --with crawl4ai python ./references/scrape_docs.py <URL> [OPTIONS]| Option | Description | Default |
|---|---|---|
| Output directory | |
| Maximum link depth | |
| Maximum pages to scrape | |
| URL filter (glob) | Auto-detected |
| Suppress verbose output | |
# Basic - scrape to ./docs/documentation_v3/
uv run --with crawl4ai python ./references/scrape_docs.py \
https://mediasoup.org/documentation/v3/
# Custom output directory
uv run --with crawl4ai python ./references/scrape_docs.py \
https://docs.rombo.co/tailwind \
--output ./my-tailwind-docs
# Limit crawl scope
uv run --with crawl4ai python ./references/scrape_docs.py \
https://tanstack.com/start/latest/docs/framework/react/overview \
--max-pages 50 \
--max-depth 3
# Custom URL pattern filter
uv run --with crawl4ai python ./references/scrape_docs.py \
https://example.com/docs/api/v2/ \
--url-pattern "*api/v2/*"docs/<name>/
index.md # Root page
getting-started.md
api/
overview.md
client.md
guides/
installation.md| Issue | Solution |
|---|---|
| Run |
| Empty output | Check if URL pattern matches actual doc URLs. Try |
| Missing pages | Increase |
| Wrong pages scraped | Use stricter |
--max-pages 10