Loading...
Loading...
Use this skill for XCrawl crawl tasks, including bulk site crawling, crawler rule design, async status polling, and delivery of crawl output for downstream scrape and search workflows.
npx skill4agent add xcrawl-api/xcrawl-skills xcrawl-crawlXCRAWL_API_KEY~/.xcrawl/config.json{
"XCRAWL_API_KEY": "<your_api_key>"
}https://dash.xcrawl.com/1000curlnodePOST /v1/crawlGET /v1/crawl/{crawl_id}https://run.xcrawl.comAuthorization: Bearer <XCRAWL_API_KEY>API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"
CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/crawl" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{"url":"https://example.com","crawler":{"limit":100,"max_depth":2},"output":{"formats":["markdown","links"]}}')"
echo "$CREATE_RESP"
CRAWL_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.crawl_id||"")' "$CREATE_RESP")"
curl -sS -X GET "https://run.xcrawl.com/v1/crawl/${CRAWL_ID}" \
-H "Authorization: Bearer ${API_KEY}"node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",crawler:{limit:300,max_depth:3,include:["/docs/.*"],exclude:["/blog/.*"]},request:{locale:"ja-JP"},output:{formats:["markdown","links","json"]}};
fetch("https://run.xcrawl.com/v1/crawl",{
method:"POST",
headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'POST https://run.xcrawl.com/v1/crawlContent-Type: application/jsonAuthorization: Bearer <api_key>| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | Yes | - | Site entry URL |
| object | No | - | Crawler config |
| object | No | - | Proxy config |
| object | No | - | Request config |
| object | No | - | JS rendering config |
| object | No | - | Output config |
| object | No | - | Async callback config |
crawler| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| integer | No | | Max pages |
| string[] | No | - | Include only matching URLs (regex supported) |
| string[] | No | - | Exclude matching URLs (regex supported) |
| integer | No | | Max depth from entry URL |
| boolean | No | | Crawl full site instead of only subpaths |
| boolean | No | | Include subdomains |
| boolean | No | | Include external links |
| boolean | No | | Use site sitemap |
proxy| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | | ISO-3166-1 alpha-2 country code, e.g. |
| string | No | Auto-generated | Sticky session ID; same ID attempts to reuse exit |
request| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | | Affects |
| string | No | | |
| object map | No | - | Cookie key/value pairs |
| object map | No | - | Header key/value pairs |
| boolean | No | | Return main content only |
| boolean | No | | Attempt to block ad resources |
| boolean | No | | Skip TLS verification |
js_render| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| boolean | No | | Enable browser rendering |
| string | No | | |
| integer | No | - | Viewport width (desktop |
| integer | No | - | Viewport height (desktop |
output| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string[] | No | | Output formats |
| string | No | | |
| string | No | - | Extraction prompt |
| object | No | - | JSON Schema |
output.formatshtmlraw_htmlmarkdownlinkssummaryscreenshotjsonwebhook| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | - | Callback URL |
| object map | No | - | Custom callback headers |
| string[] | No | | Events: |
POST /v1/crawl| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | Always |
GET /v1/crawl/{crawl_id}| Field | Type | Description |
|---|---|---|
| string | Task ID |
| string | Always |
| string | Version |
| string | |
| string | Entry URL |
| object[] | Per-page result array |
| string | Start time (ISO 8601) |
| string | End time (ISO 8601) |
| integer | Total credits used |
data[]output.formatshtmlraw_htmlmarkdownlinkssummaryscreenshotjsonmetadatatraffic_bytescredits_usedcredits_detailcrawl_idGET /v1/crawl/{crawl_id}pendingcrawlingcompletedfailedPOST /v1/crawlGET /v1/crawl/{crawl_id}request_payload