Loading...
Loading...
Crawl entire websites using Cloudflare Browser Rendering /crawl API. Initiates async crawl jobs, polls for completion, and saves results as markdown files. Useful for ingesting documentation sites, knowledge bases, or any web content into your project context. Requires CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN environment variables.
npx skill4agent add davila7/claude-code-templates cf-crawlCLOUDFLARE_ACCOUNT_IDCLOUDFLARE_API_TOKENCLOUDFLARE_ACCOUNT_IDCLOUDFLARE_API_TOKEN.env.env.env.local.env.local.env~/.env.envCLOUDFLARE_ACCOUNT_ID=CLOUDFLARE_API_TOKEN=# Load from .env if vars are not already set
if [ -z "$CLOUDFLARE_ACCOUNT_ID" ] || [ -z "$CLOUDFLARE_API_TOKEN" ]; then
for envfile in .env .env.local "$HOME/.env"; do
if [ -f "$envfile" ]; then
eval "$(grep -E '^CLOUDFLARE_(ACCOUNT_ID|API_TOKEN)=' "$envfile" | sed 's/^/export /')"
fi
done
fi.envCLOUDFLARE_ACCOUNT_ID=your-account-id
CLOUDFLARE_API_TOKEN=your-api-tokencurl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"url": "<TARGET_URL>",
"limit": <NUMBER_OF_PAGES>,
"formats": ["markdown"],
"options": {
"excludePatterns": ["**/changelog/**", "**/api-reference/**"]
}
}'modifiedSincecurl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"url": "<TARGET_URL>",
"limit": <NUMBER_OF_PAGES>,
"formats": ["markdown"],
"modifiedSince": <UNIX_TIMESTAMP>
}'--sincedate -d "2026-03-10" +%sdate -j -f "%Y-%m-%d" "2026-03-10" +%s{"success": true, "result": "job-uuid-here"}curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?limit=1" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Status: {d[\"result\"][\"status\"]} | Finished: {d[\"result\"][\"finished\"]}/{d[\"result\"][\"total\"]}')"runningcompletedcancelled_due_to_timeoutcancelled_due_to_limitserroredmodifiedSince# See which pages were skipped (not modified since the given timestamp)
curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=skipped&limit=50" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"cursorcurl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50&cursor=<CURSOR>" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"# Create output directory
mkdir -p .crawl-output
# Fetch and save all pages
python3 -c "
import json, os, re, sys, urllib.request
account_id = os.environ['CLOUDFLARE_ACCOUNT_ID']
api_token = os.environ['CLOUDFLARE_API_TOKEN']
job_id = '<JOB_ID>'
base = f'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}'
outdir = '.crawl-output'
os.makedirs(outdir, exist_ok=True)
cursor = None
total_saved = 0
while True:
url = f'{base}?status=completed&limit=50'
if cursor:
url += f'&cursor={cursor}'
req = urllib.request.Request(url, headers={
'Authorization': f'Bearer {api_token}'
})
with urllib.request.urlopen(req) as resp:
data = json.load(resp)
records = data.get('result', {}).get('records', [])
if not records:
break
for rec in records:
page_url = rec.get('url', '')
md = rec.get('markdown', '')
if not md:
continue
# Convert URL to filename
name = re.sub(r'https?://', '', page_url)
name = re.sub(r'[^a-zA-Z0-9]', '_', name).strip('_')[:120]
filepath = os.path.join(outdir, f'{name}.md')
with open(filepath, 'w') as f:
f.write(f'<!-- Source: {page_url} -->\n\n')
f.write(md)
total_saved += 1
cursor = data.get('result', {}).get('cursor')
if cursor is None:
break
print(f'Saved {total_saved} pages to {outdir}/')
"| Parameter | Type | Default | Description |
|---|---|---|---|
| string | (required) | Starting URL to crawl |
| number | 10 | Max pages to crawl (up to 100,000) |
| number | 100,000 | Max link depth from starting URL |
| array | ["html"] | Output formats: |
| boolean | true | |
| string | "all" | Page discovery: |
| number | 86400 | Cache validity in seconds (max 604800) |
| number | - | Unix timestamp; only crawl pages modified after this time |
| Parameter | Type | Default | Description |
|---|---|---|---|
| array | [] | Wildcard patterns to include ( |
| array | [] | Wildcard patterns to exclude (higher priority) |
| boolean | false | Follow links to subdomains |
| boolean | false | Follow external links |
| Parameter | Type | Description |
|---|---|---|
| object | AI-powered structured extraction (prompt, response_format) |
| object | HTTP basic auth (username, password) |
| object | Custom headers for requests |
| array | Skip: image, media, font, stylesheet |
| string | Custom user agent string |
| array | Custom cookies for requests |
/cf-crawl https://docs.example.com --limit 50/cf-crawl https://docs.example.com --limit 100 --include "/guides/**,/api/**" --exclude "/changelog/**"/cf-crawl https://docs.example.com --limit 50 --since 2026-03-10status=skipped/cf-crawl https://docs.example.com --no-render --limit 200/cf-crawl https://docs.example.com --limit 50 --merge/cf-crawl--limit N-l N--depth N-d N--include "pattern1,pattern2"--exclude "pattern1,pattern2"--no-render--merge--output DIR-o DIR.crawl-output--source sitemaps|links|all--since DATE2026-03-10modifiedSince"status": "disallowed"render: false*/**/