scrapeninja
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseScrapeNinja
ScrapeNinja
High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.
Official docs: https://scrapeninja.net/docs/
When to Use
适用场景
Use this skill when you need to:
- Scrape websites with anti-bot protection (Cloudflare, Datadome)
- Extract data without running a full browser (fast endpoint)
/scrape - Render JavaScript-heavy pages (endpoint)
/scrape-js - Use rotating proxies with geo selection (US, EU, Brazil, etc.)
- Extract structured data with Cheerio extractors
- Intercept AJAX requests
- Take screenshots of pages
在以下场景中使用本工具:
- 抓取带有反爬保护(Cloudflare、Datadome)的网站
- 无需运行完整浏览器即可提取数据(快速端点)
/scrape - 渲染重度依赖JavaScript的页面(端点)
/scrape-js - 使用支持地域选择的轮换代理(美国、欧盟、巴西等)
- 通过Cheerio提取器提取结构化数据
- 拦截AJAX请求
- 截取页面截图
Prerequisites
前置条件
- Get an API key from RapidAPI or APIRoad:
- RapidAPI: https://rapidapi.com/restyler/api/scrapeninja
- APIRoad: https://apiroad.net/marketplace/apis/scrapeninja
Set environment variable:
bash
undefined- 从RapidAPI或APIRoad获取API密钥:
- RapidAPI:https://rapidapi.com/restyler/api/scrapeninja
- APIRoad:https://apiroad.net/marketplace/apis/scrapeninja
设置环境变量:
bash
undefinedFor RapidAPI
For RapidAPI
export SCRAPENINJA_API_KEY="your-rapidapi-key"
export SCRAPENINJA_API_KEY="your-rapidapi-key"
For APIRoad (use X-Apiroad-Key header instead)
For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_API_KEY="your-apiroad-key"
---
> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```export SCRAPENINJA_API_KEY="your-apiroad-key"
---
> **重要提示:** 当在包含管道的命令中使用`$VAR`时,请将包含`$VAR`的命令用`bash -c '...'`包裹。由于Claude Code的bug,直接使用管道时环境变量会被静默清除。
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```How to Use
使用方法
1. Basic Scrape (Non-JS, Fast)
1. 基础抓取(非JS,快速)
High-performance scraping with Chrome TLS fingerprint, no JavaScript:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com"
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'With custom headers and retries:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"headers": ["Accept-Language: en-US"],
"retryNum": 3,
"timeout": 15
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'具备Chrome TLS指纹的高性能抓取,无需JavaScript:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com"
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'自定义请求头与重试:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"headers": ["Accept-Language: en-US"],
"retryNum": 3,
"timeout": 15
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'2. Scrape with JavaScript Rendering
2. 带JS渲染的抓取
For JavaScript-heavy sites (React, Vue, etc.):
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"waitForSelector": "h1",
"timeout": 20
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'With screenshot:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"screenshot": true
}Then run:
bash
undefined针对重度依赖JavaScript的网站(React、Vue等):
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"waitForSelector": "h1",
"timeout": 20
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'带截图功能:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"screenshot": true
}然后运行:
bash
undefinedGet screenshot URL from response
Get screenshot URL from response
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot'
undefinedbash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot'
undefined3. Geo-Based Proxy Selection
3. 基于地域的代理选择
Use proxies from specific regions:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"geo": "eu"
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .infoAvailable geos: , , (Brazil), (France), (Germany),
useubrfrde4g-eu使用特定地区的代理:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"geo": "eu"
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info可用地域:、、(巴西)、(法国)、(德国)、
useubrfrde4g-eu4. Smart Retries
4. 智能重试
Retry on specific HTTP status codes or text patterns:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"retryNum": 3,
"statusNotExpected": [403, 429, 503],
"textNotExpected": ["captcha", "Access Denied"]
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'针对特定HTTP状态码或文本模式进行重试:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"retryNum": 3,
"statusNotExpected": [403, 429, 503],
"textNotExpected": ["captcha", "Access Denied"]
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'5. Extract Data with Cheerio
5. 通过Cheerio提取数据
Extract structured JSON using Cheerio extractor functions:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://news.ycombinator.com",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'使用Cheerio提取器函数提取结构化JSON:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://news.ycombinator.com",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\\\".titleline > a\\\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\\\"href\\\")})).get(); }"
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'6. Intercept AJAX Requests
6. 拦截AJAX请求
Capture XHR/fetch responses:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"catchAjaxHeadersUrlMask": "api/data"
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'捕获XHR/fetch响应:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"catchAjaxHeadersUrlMask": "api/data"
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'7. Block Resources for Speed
7. 阻止资源加载以提升速度
Speed up JS rendering by blocking images and media:
Write to :
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"blockImages": true,
"blockMedia": true
}Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'通过阻止图片和媒体加载来加速JS渲染:
写入:
/tmp/scrapeninja_request.jsonjson
{
"url": "https://example.com",
"blockImages": true,
"blockMedia": true
}然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'API Endpoints
API端点
| Endpoint | Description |
|---|---|
| Fast non-JS scraping with Chrome TLS fingerprint |
| Full Chrome browser with JS rendering |
| Enhanced JS rendering for protected sites (APIRoad only) |
| 端点 | 描述 |
|---|---|
| 带有Chrome TLS指纹的快速非JS抓取 |
| 完整Chrome浏览器,支持JS渲染 |
| 增强版JS渲染,适用于受保护网站(仅APIRoad可用) |
Request Parameters
请求参数
Common Parameters (all endpoints)
通用参数(所有端点)
| Parameter | Type | Default | Description |
|---|---|---|---|
| string | required | URL to scrape |
| string[] | - | Custom HTTP headers |
| int | 1 | Number of retry attempts |
| string | | Proxy geo: us, eu, br, fr, de, 4g-eu |
| string | - | Custom proxy URL (overrides geo) |
| int | 10/16 | Timeout per attempt in seconds |
| string[] | - | Text patterns that trigger retry |
| int[] | [403, 502] | HTTP status codes that trigger retry |
| string | - | Cheerio extractor function |
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| string | 必填 | 要抓取的URL |
| string[] | - | 自定义HTTP请求头 |
| int | 1 | 重试次数 |
| string | | 代理地域:us、eu、br、fr、de、4g-eu |
| string | - | 自定义代理URL(会覆盖地域设置) |
| int | 10/16 | 每次尝试的超时时间(秒) |
| string[] | - | 触发重试的文本模式 |
| int[] | [403, 502] | 触发重试的HTTP状态码 |
| string | - | Cheerio提取器函数 |
JS Rendering Parameters (/scrape-js
, /v2/scrape-js
)
/scrape-js/v2/scrape-jsJS渲染参数(/scrape-js
、/v2/scrape-js
)
/scrape-js/v2/scrape-js| Parameter | Type | Default | Description |
|---|---|---|---|
| string | - | CSS selector to wait for |
| int | - | Extra wait time after load (1-12s) |
| bool | true | Take page screenshot |
| bool | false | Block image loading |
| bool | false | Block CSS/fonts loading |
| string | - | URL pattern to intercept AJAX |
| object | 1920x1080 | Custom viewport size |
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| string | - | 等待加载完成的CSS选择器 |
| int | - | 加载完成后的额外等待时间(1-12秒) |
| bool | true | 截取页面截图 |
| bool | false | 阻止图片加载 |
| bool | false | 阻止CSS/字体加载 |
| string | - | 用于拦截AJAX的URL模式 |
| object | 1920x1080 | 自定义视口尺寸 |
Response Format
响应格式
json
{
"info": {
"statusCode": 200,
"finalUrl": "https://example.com",
"headers": ["content-type: text/html"],
"screenshot": "base64-encoded-png",
"catchedAjax": {
"url": "https://example.com/api/data",
"method": "GET",
"body": "...",
"status": 200
}
},
"body": "<html>...</html>",
"extractor": { "extracted": "data" }
}json
{
"info": {
"statusCode": 200,
"finalUrl": "https://example.com",
"headers": ["content-type: text/html"],
"screenshot": "base64-encoded-png",
"catchedAjax": {
"url": "https://example.com/api/data",
"method": "GET",
"body": "...",
"status": 200
}
},
"body": "<html>...</html>",
"extractor": { "extracted": "data" }
}Guidelines
使用指南
- Start with : Use the fast non-JS endpoint first, only switch to
/scrapeif needed/scrape-js - Retries: Set to 2-3 for unreliable sites
retryNum - Geo Selection: Use for European sites,
eufor American sitesus - Extractors: Test extractors at https://scrapeninja.net/cheerio-sandbox/
- Blocked Sites: For Cloudflare/Datadome protected sites, use via APIRoad
/v2/scrape-js - Screenshots: Set to speed up JS rendering
screenshot: false - Rate Limits: Check your plan limits on RapidAPI/APIRoad dashboard
- 优先使用端点:先使用快速的非JS端点,仅在必要时切换到
/scrape/scrape-js - 重试设置:针对不稳定的网站,将设置为2-3
retryNum - 地域选择:针对欧洲网站使用,美国网站使用
euus - 提取器测试:在https://scrapeninja.net/cheerio-sandbox/测试提取器
- 受保护网站:针对Cloudflare/Datadome保护的网站,使用APIRoad提供的端点
/v2/scrape-js - 截图优化:设置以加速JS渲染
screenshot: false - 速率限制:在RapidAPI/APIRoad控制台查看你的套餐限制
Tools
工具
- Playground: https://scrapeninja.net/scraper-sandbox
- Cheerio Sandbox: https://scrapeninja.net/cheerio-sandbox
- cURL Converter: https://scrapeninja.net/curl-to-scraper
- 在线调试Playground:https://scrapeninja.net/scraper-sandbox
- Cheerio沙箱:https://scrapeninja.net/cheerio-sandbox
- cURL转换器:https://scrapeninja.net/curl-to-scraper