Loading...
Loading...
Fetches web pages and converts them to clean markdown using a robust 3-tier chain (Firecrawl → Jina Reader → Scrapling stealth browser). Use this skill instead of WebFetch whenever the user provides a URL and needs the page's text content — especially for sites that block direct access: medium.com articles (paywalled/metered), WeChat public accounts (mp.weixin.qq.com, geo-restricted), documentation sites with bot protection, or any page where simple HTTP fetching might return a CAPTCHA or empty page. Triggers for: "read this URL", "summarize this article/page", "grab the content from", "extract text from", "what does this page say", "fetch this link", or any request to access and process a specific web page. Do NOT trigger for: building scrapers, checking HTTP status codes, parsing already-downloaded HTML files, answering conceptual questions about scraping tools, or monitoring page changes.
npx skill4agent add gn00678465/crawler-skill crawleruv run scripts/crawl.py --url https://example.com --output reports/example.md--output01| Tier | Module | Requires |
|---|---|---|
| 1 | Firecrawl ( | |
| 2 | Jina Reader ( | Nothing — free, no key needed |
| 3 | Scrapling ( | Local headless browser (auto-installs via pip) |
crawler-skill/
├── SKILL.md ← this file
├── scripts/
│ ├── crawl.py ← main CLI entry point (PEP 723 inline deps)
│ └── src/
│ ├── domain_router.py ← URL-to-tier routing rules
│ ├── firecrawl_scraper.py ← Tier 1: Firecrawl API
│ ├── jina_reader.py ← Tier 2: Jina r.jina.ai proxy
│ └── scrapling_scraper.py ← Tier 3: local headless scraper
└── tests/
└── test_crawl.py ← 70 pytest tests (all passing)# Basic fetch — tries Firecrawl, falls back to Jina, then Scrapling
# Always prefer using --output to avoid terminal encoding issues
uv run scripts/crawl.py --url https://docs.python.org/3/ --output reports/python_docs.md
# If no --output is provided, markdown goes to stdout (not recommended on Windows)
uv run scripts/crawl.py --url https://example.com
# With a Firecrawl API key for best results
FIRECRAWL_API_KEY=fc-... uv run scripts/crawl.py --url https://example.com --output reports/example.mdhttp://https://ftp://file://javascript:1--outputreports/{project_root}/reportsuv run scripts/crawl.py --url <URL> --output reports/result.mdFIRECRAWL_API_URL=http://localhost:3002 uv run scripts/crawl.py --url https://example.comError:Access Deniedscripts/src/domain_router.py| Domain | Skipped tiers | Active chain |
|---|---|---|
| firecrawl | jina → scrapling |
| firecrawl + jina | scrapling only |
| everything else | — | firecrawl → jina → scrapling |
blog.medium.commedium.com.medium.comother.weixin.qq.commp.weixin.qq.comuv run pytest tests/ -vuv runfirecrawl-py>=2.0httpx>=0.27scrapling>=0.2html2text>=2024.2.26crawl.pyresult = subprocess.run(
["uv", "run", "scripts/crawl.py", "--url", url],
capture_output=True, text=True
)
if result.returncode == 0:
markdown = result.stdout