scrapeninja

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ScrapeNinja

High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.

Official docs: https://scrapeninja.net/docs/

具备Chrome TLS指纹、轮换代理、智能重试及可选JavaScript渲染能力的高性能网页抓取API。

官方文档：https://scrapeninja.net/docs/

When to Use

适用场景

Use this skill when you need to:

Scrape websites with anti-bot protection (Cloudflare, Datadome)
Extract data without running a full browser (fast
```
/scrape
```
endpoint)
Render JavaScript-heavy pages (
```
/scrape-js
```
endpoint)
Use rotating proxies with geo selection (US, EU, Brazil, etc.)
Extract structured data with Cheerio extractors
Intercept AJAX requests
Take screenshots of pages

在以下场景中使用本工具：

抓取带有反爬保护（Cloudflare、Datadome）的网站
无需运行完整浏览器即可提取数据（快速
```
/scrape
```
端点）
渲染重度依赖JavaScript的页面（
```
/scrape-js
```
端点）
使用支持地域选择的轮换代理（美国、欧盟、巴西等）
通过Cheerio提取器提取结构化数据
拦截AJAX请求
截取页面截图

Prerequisites

前置条件

Get an API key from RapidAPI or APIRoad:

RapidAPI: https://rapidapi.com/restyler/api/scrapeninja
APIRoad: https://apiroad.net/marketplace/apis/scrapeninja

Set environment variable:

bash

undefined

从RapidAPI或APIRoad获取API密钥：

RapidAPI：https://rapidapi.com/restyler/api/scrapeninja
APIRoad：https://apiroad.net/marketplace/apis/scrapeninja

设置环境变量：

bash

undefined

For RapidAPI

export SCRAPENINJA_API_KEY="your-rapidapi-key"

For APIRoad (use X-Apiroad-Key header instead)

export SCRAPENINJA_API_KEY="your-apiroad-key"


---


> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```

export SCRAPENINJA_API_KEY="your-apiroad-key"


---


> **重要提示：** 当在包含管道的命令中使用`$VAR`时，请将包含`$VAR`的命令用`bash -c '...'`包裹。由于Claude Code的bug，直接使用管道时环境变量会被静默清除。
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```

How to Use

使用方法

1. Basic Scrape (Non-JS, Fast)

1. 基础抓取（非JS，快速）

High-performance scraping with Chrome TLS fingerprint, no JavaScript:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com"
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'

With custom headers and retries:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com",
  "headers": ["Accept-Language: en-US"],
  "retryNum": 3,
  "timeout": 15
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

具备Chrome TLS指纹的高性能抓取，无需JavaScript：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'

自定义请求头与重试：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com",
  "headers": ["Accept-Language: en-US"],
  "retryNum": 3,
  "timeout": 15
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

2. Scrape with JavaScript Rendering

2. 带JS渲染的抓取

For JavaScript-heavy sites (React, Vue, etc.):

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com",
  "waitForSelector": "h1",
  "timeout": 20
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'

With screenshot:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com",
  "screenshot": true
}

Then run:

bash

undefined

针对重度依赖JavaScript的网站（React、Vue等）：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com",
  "waitForSelector": "h1",
  "timeout": 20
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'

带截图功能：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com",
  "screenshot": true
}

然后运行：

bash

undefined

Get screenshot URL from response

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot'

undefined

undefined

3. Geo-Based Proxy Selection

3. 基于地域的代理选择

Use proxies from specific regions:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com",
  "geo": "eu"
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info

Available geos:

us

eu

br

(Brazil),

fr

(France),

de

(Germany),

4g-eu

使用特定地区的代理：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com",
  "geo": "eu"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info

可用地域：

us

、

eu

、

br

（巴西）、

fr

（法国）、

de

（德国）、

4g-eu

4. Smart Retries

4. 智能重试

Retry on specific HTTP status codes or text patterns:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com",
  "retryNum": 3,
  "statusNotExpected": [403, 429, 503],
  "textNotExpected": ["captcha", "Access Denied"]
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

针对特定HTTP状态码或文本模式进行重试：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com",
  "retryNum": 3,
  "statusNotExpected": [403, 429, 503],
  "textNotExpected": ["captcha", "Access Denied"]
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

5. Extract Data with Cheerio

5. 通过Cheerio提取数据

Extract structured JSON using Cheerio extractor functions:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://news.ycombinator.com",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'

使用Cheerio提取器函数提取结构化JSON：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://news.ycombinator.com",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\\\".titleline > a\\\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\\\"href\\\")})).get(); }"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'

6. Intercept AJAX Requests

6. 拦截AJAX请求

Capture XHR/fetch responses:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com",
  "catchAjaxHeadersUrlMask": "api/data"
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'

捕获XHR/fetch响应：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com",
  "catchAjaxHeadersUrlMask": "api/data"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'

7. Block Resources for Speed

7. 阻止资源加载以提升速度

Speed up JS rendering by blocking images and media:

Write to

/tmp/scrapeninja_request.json

json

{
  "url": "https://example.com",
  "blockImages": true,
  "blockMedia": true
}

Then run:

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

通过阻止图片和媒体加载来加速JS渲染：

写入

/tmp/scrapeninja_request.json

：

json

{
  "url": "https://example.com",
  "blockImages": true,
  "blockMedia": true
}

然后运行：

bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

API Endpoints

API端点

Endpoint	Description
`/scrape`	Fast non-JS scraping with Chrome TLS fingerprint
`/scrape-js`	Full Chrome browser with JS rendering
`/v2/scrape-js`	Enhanced JS rendering for protected sites (APIRoad only)

端点	描述
`/scrape`	带有Chrome TLS指纹的快速非JS抓取
`/scrape-js`	完整Chrome浏览器，支持JS渲染
`/v2/scrape-js`	增强版JS渲染，适用于受保护网站（仅APIRoad可用）

Request Parameters

请求参数

Common Parameters (all endpoints)

通用参数（所有端点）

Parameter	Type	Default	Description
`url`	string	required	URL to scrape
`headers`	string[]	-	Custom HTTP headers
`retryNum`	int	1	Number of retry attempts
`geo`	string	`us`	Proxy geo: us, eu, br, fr, de, 4g-eu
`proxy`	string	-	Custom proxy URL (overrides geo)
`timeout`	int	10/16	Timeout per attempt in seconds
`textNotExpected`	string[]	-	Text patterns that trigger retry
`statusNotExpected`	int[]	[403, 502]	HTTP status codes that trigger retry
`extractor`	string	-	Cheerio extractor function

参数	类型	默认值	描述
`url`	string	必填	要抓取的URL
`headers`	string[]	-	自定义HTTP请求头
`retryNum`	int	1	重试次数
`geo`	string	`us`	代理地域：us、eu、br、fr、de、4g-eu
`proxy`	string	-	自定义代理URL（会覆盖地域设置）
`timeout`	int	10/16	每次尝试的超时时间（秒）
`textNotExpected`	string[]	-	触发重试的文本模式
`statusNotExpected`	int[]	[403, 502]	触发重试的HTTP状态码
`extractor`	string	-	Cheerio提取器函数

JS Rendering Parameters (

/scrape-js

/v2/scrape-js

)

JS渲染参数（

/scrape-js

、

/v2/scrape-js

）

Parameter	Type	Default	Description
`waitForSelector`	string	-	CSS selector to wait for
`postWaitTime`	int	-	Extra wait time after load (1-12s)
`screenshot`	bool	true	Take page screenshot
`blockImages`	bool	false	Block image loading
`blockMedia`	bool	false	Block CSS/fonts loading
`catchAjaxHeadersUrlMask`	string	-	URL pattern to intercept AJAX
`viewport`	object	1920x1080	Custom viewport size

参数	类型	默认值	描述
`waitForSelector`	string	-	等待加载完成的CSS选择器
`postWaitTime`	int	-	加载完成后的额外等待时间（1-12秒）
`screenshot`	bool	true	截取页面截图
`blockImages`	bool	false	阻止图片加载
`blockMedia`	bool	false	阻止CSS/字体加载
`catchAjaxHeadersUrlMask`	string	-	用于拦截AJAX的URL模式
`viewport`	object	1920x1080	自定义视口尺寸

Response Format

响应格式

json

{
  "info": {
  "statusCode": 200,
  "finalUrl": "https://example.com",
  "headers": ["content-type: text/html"],
  "screenshot": "base64-encoded-png",
  "catchedAjax": {
  "url": "https://example.com/api/data",
  "method": "GET",
  "body": "...",
  "status": 200
  }
  },
  "body": "<html>...</html>",
  "extractor": { "extracted": "data" }
}

json

{
  "info": {
  "statusCode": 200,
  "finalUrl": "https://example.com",
  "headers": ["content-type: text/html"],
  "screenshot": "base64-encoded-png",
  "catchedAjax": {
  "url": "https://example.com/api/data",
  "method": "GET",
  "body": "...",
  "status": 200
  }
  },
  "body": "<html>...</html>",
  "extractor": { "extracted": "data" }
}

Guidelines

使用指南

Start with
/scrape
: Use the fast non-JS endpoint first, only switch to
```
/scrape-js
```
if needed
Retries: Set
```
retryNum
```
to 2-3 for unreliable sites
Geo Selection: Use
```
eu
```
for European sites,
```
us
```
for American sites
Extractors: Test extractors at https://scrapeninja.net/cheerio-sandbox/
Blocked Sites: For Cloudflare/Datadome protected sites, use
```
/v2/scrape-js
```
via APIRoad
Screenshots: Set
```
screenshot: false
```
to speed up JS rendering
Rate Limits: Check your plan limits on RapidAPI/APIRoad dashboard

优先使用
/scrape
端点：先使用快速的非JS端点，仅在必要时切换到
```
/scrape-js
```
重试设置：针对不稳定的网站，将
```
retryNum
```
设置为2-3
地域选择：针对欧洲网站使用
```
eu
```
，美国网站使用
```
us
```
提取器测试：在https://scrapeninja.net/cheerio-sandbox/测试提取器
受保护网站：针对Cloudflare/Datadome保护的网站，使用APIRoad提供的
```
/v2/scrape-js
```
端点
截图优化：设置
```
screenshot: false
```
以加速JS渲染
速率限制：在RapidAPI/APIRoad控制台查看你的套餐限制

Tools

工具

Playground: https://scrapeninja.net/scraper-sandbox
Cheerio Sandbox: https://scrapeninja.net/cheerio-sandbox
cURL Converter: https://scrapeninja.net/curl-to-scraper

在线调试Playground：https://scrapeninja.net/scraper-sandbox
Cheerio沙箱：https://scrapeninja.net/cheerio-sandbox
cURL转换器：https://scrapeninja.net/curl-to-scraper

scrapeninja

Original

Translation

ScrapeNinja

ScrapeNinja

When to Use

适用场景

Prerequisites

前置条件

For RapidAPI

For RapidAPI

For APIRoad (use X-Apiroad-Key header instead)

For APIRoad (use X-Apiroad-Key header instead)

How to Use

使用方法

1. Basic Scrape (Non-JS, Fast)

1. 基础抓取（非JS，快速）

2. Scrape with JavaScript Rendering

2. 带JS渲染的抓取

Get screenshot URL from response

Get screenshot URL from response

3. Geo-Based Proxy Selection

3. 基于地域的代理选择

4. Smart Retries

4. 智能重试

5. Extract Data with Cheerio

5. 通过Cheerio提取数据

6. Intercept AJAX Requests

6. 拦截AJAX请求

7. Block Resources for Speed

7. 阻止资源加载以提升速度

API Endpoints

API端点

Request Parameters

请求参数

Common Parameters (all endpoints)

通用参数（所有端点）

JS Rendering Parameters (/scrape-js, /v2/scrape-js)

JS渲染参数（/scrape-js、/v2/scrape-js）

Response Format

响应格式

Guidelines

使用指南

Tools

工具

JS Rendering Parameters (
`/scrape-js`
,
`/v2/scrape-js`
)

JS渲染参数（
`/scrape-js`
、
`/v2/scrape-js`
）