scrapeninja

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ScrapeNinja

ScrapeNinja

High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.

具备Chrome TLS指纹、轮换代理、智能重试及可选JavaScript渲染能力的高性能网页抓取API。

When to Use

适用场景

Use this skill when you need to:
  • Scrape websites with anti-bot protection (Cloudflare, Datadome)
  • Extract data without running a full browser (fast
    /scrape
    endpoint)
  • Render JavaScript-heavy pages (
    /scrape-js
    endpoint)
  • Use rotating proxies with geo selection (US, EU, Brazil, etc.)
  • Extract structured data with Cheerio extractors
  • Intercept AJAX requests
  • Take screenshots of pages

在以下场景中使用本工具:
  • 抓取带有反爬保护(Cloudflare、Datadome)的网站
  • 无需运行完整浏览器即可提取数据(快速
    /scrape
    端点)
  • 渲染重度依赖JavaScript的页面(
    /scrape-js
    端点)
  • 使用支持地域选择的轮换代理(美国、欧盟、巴西等)
  • 通过Cheerio提取器提取结构化数据
  • 拦截AJAX请求
  • 截取页面截图

Prerequisites

前置条件

  1. Get an API key from RapidAPI or APIRoad:
Set environment variable:
bash
undefined
  1. 从RapidAPI或APIRoad获取API密钥:
设置环境变量:
bash
undefined

For RapidAPI

For RapidAPI

export SCRAPENINJA_API_KEY="your-rapidapi-key"
export SCRAPENINJA_API_KEY="your-rapidapi-key"

For APIRoad (use X-Apiroad-Key header instead)

For APIRoad (use X-Apiroad-Key header instead)

export SCRAPENINJA_API_KEY="your-apiroad-key"

---


> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```
export SCRAPENINJA_API_KEY="your-apiroad-key"

---


> **重要提示:** 当在包含管道的命令中使用`$VAR`时,请将包含`$VAR`的命令用`bash -c '...'`包裹。由于Claude Code的bug,直接使用管道时环境变量会被静默清除。
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```

How to Use

使用方法

1. Basic Scrape (Non-JS, Fast)

1. 基础抓取(非JS,快速)

High-performance scraping with Chrome TLS fingerprint, no JavaScript:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com"
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'
With custom headers and retries:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com",
  "headers": ["Accept-Language: en-US"],
  "retryNum": 3,
  "timeout": 15
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'
具备Chrome TLS指纹的高性能抓取,无需JavaScript:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'
自定义请求头与重试:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com",
  "headers": ["Accept-Language: en-US"],
  "retryNum": 3,
  "timeout": 15
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

2. Scrape with JavaScript Rendering

2. 带JS渲染的抓取

For JavaScript-heavy sites (React, Vue, etc.):
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com",
  "waitForSelector": "h1",
  "timeout": 20
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'
With screenshot:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com",
  "screenshot": true
}
Then run:
bash
undefined
针对重度依赖JavaScript的网站(React、Vue等):
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com",
  "waitForSelector": "h1",
  "timeout": 20
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'
带截图功能:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com",
  "screenshot": true
}
然后运行:
bash
undefined

Get screenshot URL from response

Get screenshot URL from response

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot'
undefined
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot'
undefined

3. Geo-Based Proxy Selection

3. 基于地域的代理选择

Use proxies from specific regions:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com",
  "geo": "eu"
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info
Available geos:
us
,
eu
,
br
(Brazil),
fr
(France),
de
(Germany),
4g-eu
使用特定地区的代理:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com",
  "geo": "eu"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info
可用地域:
us
eu
br
(巴西)、
fr
(法国)、
de
(德国)、
4g-eu

4. Smart Retries

4. 智能重试

Retry on specific HTTP status codes or text patterns:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com",
  "retryNum": 3,
  "statusNotExpected": [403, 429, 503],
  "textNotExpected": ["captcha", "Access Denied"]
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'
针对特定HTTP状态码或文本模式进行重试:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com",
  "retryNum": 3,
  "statusNotExpected": [403, 429, 503],
  "textNotExpected": ["captcha", "Access Denied"]
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

5. Extract Data with Cheerio

5. 通过Cheerio提取数据

Extract structured JSON using Cheerio extractor functions:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://news.ycombinator.com",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'
使用Cheerio提取器函数提取结构化JSON:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://news.ycombinator.com",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\\\".titleline > a\\\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\\\"href\\\")})).get(); }"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'

6. Intercept AJAX Requests

6. 拦截AJAX请求

Capture XHR/fetch responses:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com",
  "catchAjaxHeadersUrlMask": "api/data"
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'
捕获XHR/fetch响应:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com",
  "catchAjaxHeadersUrlMask": "api/data"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'

7. Block Resources for Speed

7. 阻止资源加载以提升速度

Speed up JS rendering by blocking images and media:
Write to
/tmp/scrapeninja_request.json
:
json
{
  "url": "https://example.com",
  "blockImages": true,
  "blockMedia": true
}
Then run:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

通过阻止图片和媒体加载来加速JS渲染:
写入
/tmp/scrapeninja_request.json
json
{
  "url": "https://example.com",
  "blockImages": true,
  "blockMedia": true
}
然后运行:
bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

API Endpoints

API端点

EndpointDescription
/scrape
Fast non-JS scraping with Chrome TLS fingerprint
/scrape-js
Full Chrome browser with JS rendering
/v2/scrape-js
Enhanced JS rendering for protected sites (APIRoad only)

端点描述
/scrape
带有Chrome TLS指纹的快速非JS抓取
/scrape-js
完整Chrome浏览器,支持JS渲染
/v2/scrape-js
增强版JS渲染,适用于受保护网站(仅APIRoad可用)

Request Parameters

请求参数

Common Parameters (all endpoints)

通用参数(所有端点)

ParameterTypeDefaultDescription
url
stringrequiredURL to scrape
headers
string[]-Custom HTTP headers
retryNum
int1Number of retry attempts
geo
string
us
Proxy geo: us, eu, br, fr, de, 4g-eu
proxy
string-Custom proxy URL (overrides geo)
timeout
int10/16Timeout per attempt in seconds
textNotExpected
string[]-Text patterns that trigger retry
statusNotExpected
int[][403, 502]HTTP status codes that trigger retry
extractor
string-Cheerio extractor function
参数类型默认值描述
url
string必填要抓取的URL
headers
string[]-自定义HTTP请求头
retryNum
int1重试次数
geo
string
us
代理地域:us、eu、br、fr、de、4g-eu
proxy
string-自定义代理URL(会覆盖地域设置)
timeout
int10/16每次尝试的超时时间(秒)
textNotExpected
string[]-触发重试的文本模式
statusNotExpected
int[][403, 502]触发重试的HTTP状态码
extractor
string-Cheerio提取器函数

JS Rendering Parameters (
/scrape-js
,
/v2/scrape-js
)

JS渲染参数(
/scrape-js
/v2/scrape-js

ParameterTypeDefaultDescription
waitForSelector
string-CSS selector to wait for
postWaitTime
int-Extra wait time after load (1-12s)
screenshot
booltrueTake page screenshot
blockImages
boolfalseBlock image loading
blockMedia
boolfalseBlock CSS/fonts loading
catchAjaxHeadersUrlMask
string-URL pattern to intercept AJAX
viewport
object1920x1080Custom viewport size

参数类型默认值描述
waitForSelector
string-等待加载完成的CSS选择器
postWaitTime
int-加载完成后的额外等待时间(1-12秒)
screenshot
booltrue截取页面截图
blockImages
boolfalse阻止图片加载
blockMedia
boolfalse阻止CSS/字体加载
catchAjaxHeadersUrlMask
string-用于拦截AJAX的URL模式
viewport
object1920x1080自定义视口尺寸

Response Format

响应格式

json
{
  "info": {
  "statusCode": 200,
  "finalUrl": "https://example.com",
  "headers": ["content-type: text/html"],
  "screenshot": "base64-encoded-png",
  "catchedAjax": {
  "url": "https://example.com/api/data",
  "method": "GET",
  "body": "...",
  "status": 200
  }
  },
  "body": "<html>...</html>",
  "extractor": { "extracted": "data" }
}

json
{
  "info": {
  "statusCode": 200,
  "finalUrl": "https://example.com",
  "headers": ["content-type: text/html"],
  "screenshot": "base64-encoded-png",
  "catchedAjax": {
  "url": "https://example.com/api/data",
  "method": "GET",
  "body": "...",
  "status": 200
  }
  },
  "body": "<html>...</html>",
  "extractor": { "extracted": "data" }
}

Guidelines

使用指南

  1. Start with
    /scrape
    : Use the fast non-JS endpoint first, only switch to
    /scrape-js
    if needed
  2. Retries: Set
    retryNum
    to 2-3 for unreliable sites
  3. Geo Selection: Use
    eu
    for European sites,
    us
    for American sites
  4. Extractors: Test extractors at https://scrapeninja.net/cheerio-sandbox/
  5. Blocked Sites: For Cloudflare/Datadome protected sites, use
    /v2/scrape-js
    via APIRoad
  6. Screenshots: Set
    screenshot: false
    to speed up JS rendering
  7. Rate Limits: Check your plan limits on RapidAPI/APIRoad dashboard

  1. 优先使用
    /scrape
    端点
    :先使用快速的非JS端点,仅在必要时切换到
    /scrape-js
  2. 重试设置:针对不稳定的网站,将
    retryNum
    设置为2-3
  3. 地域选择:针对欧洲网站使用
    eu
    ,美国网站使用
    us
  4. 提取器测试:在https://scrapeninja.net/cheerio-sandbox/测试提取器
  5. 受保护网站:针对Cloudflare/Datadome保护的网站,使用APIRoad提供的
    /v2/scrape-js
    端点
  6. 截图优化:设置
    screenshot: false
    以加速JS渲染
  7. 速率限制:在RapidAPI/APIRoad控制台查看你的套餐限制

Tools

工具