apify

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Apify

Web scraping and automation platform. Run pre-built Actors (scrapers) or create your own. Access thousands of ready-to-use scrapers for popular websites.

Official docs: https://docs.apify.com/api/v2

网页抓取与自动化平台。你可以运行预构建的Actor（抓取器），也可以创建自定义Actor。平台提供数千个可直接使用的热门网站抓取器。

官方文档：https://docs.apify.com/api/v2

When to Use

适用场景

Use this skill when you need to:

Scrape data from websites (Amazon, Google, LinkedIn, Twitter, etc.)
Run pre-built web scrapers without coding
Extract structured data from any website
Automate web tasks at scale
Store and retrieve scraped data

在以下场景中可使用该工具：

从网站（Amazon、Google、LinkedIn、Twitter等）抓取数据
无需编码即可运行预构建的网页抓取器
从任意网站提取结构化数据
大规模自动化网页任务
存储和检索抓取到的数据

Prerequisites

前置条件

Create an account at https://apify.com/
Get your API token from https://console.apify.com/account#/integrations

Set environment variable:

bash

export APIFY_API_TOKEN="apify_api_xxxxxxxxxxxxxxxxxxxxxxxx"

Important: When using
$VAR
in a command that pipes to another command, wrap the command containing
$VAR
in
bash -c '...'
. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

设置环境变量：

bash

export APIFY_API_TOKEN="apify_api_xxxxxxxxxxxxxxxxxxxxxxxx"

重要提示： 在包含
$VAR
的命令通过管道符连接到另一个命令时，请将包含
$VAR
的命令用
bash -c '...'
包裹。由于Claude Code的bug，直接使用管道符时环境变量会被静默清除。
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

How to Use

使用方法

1. Run an Actor (Async)

1. 异步运行Actor

Start an Actor run asynchronously:

Write to

/tmp/apify_request.json

json

{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10,
  "pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Response contains
id
(run ID) and
defaultDatasetId
for fetching results.

异步启动Actor运行：

写入

/tmp/apify_request.json

：

json

{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10,
  "pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

响应包含
id
（运行ID）和
defaultDatasetId
，用于获取结果。

2. Run Actor Synchronously

2. 同步运行Actor

Wait for completion and get results directly (max 5 min):

Write to

/tmp/apify_request.json

json

{
  "startUrls": [{"url": "https://news.ycombinator.com"}],
  "maxPagesPerCrawl": 1,
  "pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

等待任务完成并直接获取结果（最长等待5分钟）：

写入

/tmp/apify_request.json

：

json

{
  "startUrls": [{"url": "https://news.ycombinator.com"}],
  "maxPagesPerCrawl": 1,
  "pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

3. Check Run Status

3. 检查运行状态

⚠️ Important: The
{runId}
below is a placeholder - replace it with the actual run ID from your async run response (found in
.data.id
). See the complete workflow example below.

Poll the run status:

bash

undefined

⚠️ 重要提示： 下方的
{runId}
是占位符 - 请替换为异步运行响应中的实际运行ID（在
.data.id
字段中）。请参考下方的完整工作流示例。

轮询运行状态：

bash

undefined

Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"

将{runId}替换为实际ID，例如"HG7ML7M8z78YcAPEB"

bash -c 'curl -s "https://api.apify.com/v2/actor-runs/{runId}" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq -r '.data.status'


**Complete workflow example** (capture run ID and check status):

Write to `/tmp/apify_request.json`:

```json
{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10
}

Then run:

bash

undefined

bash -c 'curl -s "https://api.apify.com/v2/actor-runs/{runId}" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq -r '.data.status'


**完整工作流示例**（捕获运行ID并检查状态）：

写入`/tmp/apify_request.json`：

```json
{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10
}

然后运行：

bash

undefined

Step 1: Start an async run and capture the run ID

步骤1：启动异步运行并捕获运行ID

RUN_ID=$(bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json' | jq -r '.data.id')

Step 2: Check the run status

步骤2：检查运行状态

bash -c "curl -s "https://api.apify.com/v2/actor-runs/${RUN_ID}\" --header "Authorization: Bearer ${APIFY_API_TOKEN}"" | jq '.data.status'


**Statuses**: `READY`, `RUNNING`, `SUCCEEDED`, `FAILED`, `ABORTED`, `TIMED-OUT`

bash -c "curl -s "https://api.apify.com/v2/actor-runs/${RUN_ID}\" --header "Authorization: Bearer ${APIFY_API_TOKEN}"" | jq '.data.status'


**状态说明**：`READY`、`RUNNING`、`SUCCEEDED`、`FAILED`、`ABORTED`、`TIMED-OUT`

4. Get Dataset Items

4. 获取数据集条目

⚠️ Important: The
{datasetId}
below is a placeholder - do not use it literally! You must replace it with the actual dataset ID from your run response (found in
.data.defaultDatasetId
). See the complete workflow example below for how to capture and use the real ID.

Fetch results from a completed run:

bash

undefined

⚠️ 重要提示： 下方的
{datasetId}
是占位符 - 请勿直接使用！你必须将其替换为运行响应中的实际数据集ID（在
.data.defaultDatasetId
字段中）。请参考下方的完整工作流示例了解如何捕获和使用真实ID。

从已完成的运行中获取结果：

bash

undefined

Replace {datasetId} with actual ID like "WkzbQMuFYuamGv3YF"

将{datasetId}替换为实际ID，例如"WkzbQMuFYuamGv3YF"

bash -c 'curl -s "https://api.apify.com/v2/datasets/{datasetId}/items" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'


**Complete workflow example** (run async, wait, and fetch results):

Write to `/tmp/apify_request.json`:

```json
{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10
}

Then run:

bash

undefined

bash -c 'curl -s "https://api.apify.com/v2/datasets/{datasetId}/items" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'


**完整工作流示例**（异步运行、等待完成并获取结果）：

写入`/tmp/apify_request.json`：

```json
{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 10
}

然后运行：

bash

undefined

Step 1: Start async run and capture IDs

步骤1：启动异步运行并捕获ID

RESPONSE=$(bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json')

RUN_ID=$(echo "$RESPONSE" | jq -r '.data.id') DATASET_ID=$(echo "$RESPONSE" | jq -r '.data.defaultDatasetId')

Step 2: Wait for completion (poll status)

步骤2：等待任务完成（轮询状态）

while true; do STATUS=$(bash -c "curl -s "https://api.apify.com/v2/actor-runs/${RUN_ID}\" --header "Authorization: Bearer ${APIFY_API_TOKEN}"" | jq -r '.data.status') echo "Status: $STATUS" [[ "$STATUS" == "SUCCEEDED" ]] && break [[ "$STATUS" == "FAILED" || "$STATUS" == "ABORTED" ]] && exit 1 sleep 5 done

while true; do STATUS=$(bash -c "curl -s "https://api.apify.com/v2/actor-runs/${RUN_ID}\" --header "Authorization: Bearer ${APIFY_API_TOKEN}"" | jq -r '.data.status') echo "状态: $STATUS" [[ "$STATUS" == "SUCCEEDED" ]] && break [[ "$STATUS" == "FAILED" || "$STATUS" == "ABORTED" ]] && exit 1 sleep 5 done

Step 3: Fetch the dataset items

步骤3：获取数据集条目

bash -c "curl -s "https://api.apify.com/v2/datasets/${DATASET_ID}/items\" --header "Authorization: Bearer ${APIFY_API_TOKEN}""


**With pagination:**

```bash

bash -c "curl -s "https://api.apify.com/v2/datasets/${DATASET_ID}/items\" --header "Authorization: Bearer ${APIFY_API_TOKEN}""


**分页获取：**

```bash

Replace {datasetId} with actual ID

将{datasetId}替换为实际ID

bash -c 'curl -s "https://api.apify.com/v2/datasets/{datasetId}/items?limit=100&offset=0" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'

undefined

bash -c 'curl -s "https://api.apify.com/v2/datasets/{datasetId}/items?limit=100&offset=0" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'

undefined

5. Popular Actors

5. 热门Actor

Google Search Scraper

Write to

/tmp/apify_request.json

json

{
  "queries": "web scraping tools",
  "maxPagesPerQuery": 1,
  "resultsPerPage": 10
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?timeout=120" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

写入

/tmp/apify_request.json

：

json

{
  "queries": "web scraping tools",
  "maxPagesPerQuery": 1,
  "resultsPerPage": 10
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?timeout=120" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Website Content Crawler

Write to

/tmp/apify_request.json

json

{
  "startUrls": [{"url": "https://docs.example.com"}],
  "maxCrawlPages": 10,
  "crawlerType": "cheerio"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~website-content-crawler/run-sync-get-dataset-items?timeout=300" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

写入

/tmp/apify_request.json

：

json

{
  "startUrls": [{"url": "https://docs.example.com"}],
  "maxCrawlPages": 10,
  "crawlerType": "cheerio"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~website-content-crawler/run-sync-get-dataset-items?timeout=300" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Instagram Scraper

Write to

/tmp/apify_request.json

json

{
  "directUrls": ["https://www.instagram.com/apaborotnikov/"],
  "resultsType": "posts",
  "resultsLimit": 10
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~instagram-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

写入

/tmp/apify_request.json

：

json

{
  "directUrls": ["https://www.instagram.com/apaborotnikov/"],
  "resultsType": "posts",
  "resultsLimit": 10
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/apify~instagram-scraper/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

Amazon Product Scraper

Write to

/tmp/apify_request.json

json

{
  "categoryOrProductUrls": [{"url": "https://www.amazon.com/dp/B0BSHF7WHW"}],
  "maxItemsPerStartUrl": 1
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/junglee~amazon-crawler/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

写入

/tmp/apify_request.json

：

json

{
  "categoryOrProductUrls": [{"url": "https://www.amazon.com/dp/B0BSHF7WHW"}],
  "maxItemsPerStartUrl": 1
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.apify.com/v2/acts/junglee~amazon-crawler/runs" --header "Authorization: Bearer ${APIFY_API_TOKEN}" --header "Content-Type: application/json" -d @/tmp/apify_request.json'

6. List Your Runs

6. 列出你的运行记录

Get recent Actor runs:

bash

bash -c 'curl -s "https://api.apify.com/v2/actor-runs?limit=10&desc=true" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq '.data.items[] | {id, actId, status, startedAt}'

获取近期的Actor运行记录：

bash

bash -c 'curl -s "https://api.apify.com/v2/actor-runs?limit=10&desc=true" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq '.data.items[] | {id, actId, status, startedAt}'

7. Abort a Run

7. 终止运行

⚠️ Important: The
{runId}
below is a placeholder - replace it with the actual run ID. See the complete workflow example below.

Stop a running Actor:

bash

undefined

⚠️ 重要提示： 下方的
{runId}
是占位符 - 请替换为实际运行ID。请参考下方的完整工作流示例。

停止正在运行的Actor：

bash

undefined

Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"

将{runId}替换为实际ID，例如"HG7ML7M8z78YcAPEB"

bash -c 'curl -s -X POST "https://api.apify.com/v2/actor-runs/{runId}/abort" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'


**Complete workflow example** (start a run and abort it):

Write to `/tmp/apify_request.json`:

```json
{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 100
}

Then run:

bash

undefined

bash -c 'curl -s -X POST "https://api.apify.com/v2/actor-runs/{runId}/abort" --header "Authorization: Bearer ${APIFY_API_TOKEN}"'


**完整工作流示例**（启动运行并终止）：

写入`/tmp/apify_request.json`：

```json
{
  "startUrls": [{"url": "https://example.com"}],
  "maxPagesPerCrawl": 100
}

然后运行：

bash

undefined

Step 1: Start an async run and capture the run ID

步骤1：启动异步运行并捕获运行ID

echo "Started run: $RUN_ID"

echo "已启动运行: $RUN_ID"

Step 2: Abort the run

步骤2：终止运行

bash -c "curl -s -X POST "https://api.apify.com/v2/actor-runs/${RUN_ID}/abort\" --header "Authorization: Bearer ${APIFY_API_TOKEN}""

undefined

bash -c "curl -s -X POST "https://api.apify.com/v2/actor-runs/${RUN_ID}/abort\" --header "Authorization: Bearer ${APIFY_API_TOKEN}""

undefined

8. List Available Actors

8. 列出可用Actor

Browse public Actors:

bash

bash -c 'curl -s "https://api.apify.com/v2/store?limit=20&category=ECOMMERCE" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq '.data.items[] | {name, username, title}'

浏览公开Actor：

bash

bash -c 'curl -s "https://api.apify.com/v2/store?limit=20&category=ECOMMERCE" --header "Authorization: Bearer ${APIFY_API_TOKEN}"' | jq '.data.items[] | {name, username, title}'

Popular Actors Reference

Actor ID	Description
`apify/web-scraper`	General web scraper
`apify/website-content-crawler`	Crawl entire websites
`apify/google-search-scraper`	Google search results
`apify/instagram-scraper`	Instagram posts/profiles
`junglee/amazon-crawler`	Amazon products
`apify/twitter-scraper`	Twitter/X posts
`apify/youtube-scraper`	YouTube videos
`apify/linkedin-scraper`	LinkedIn profiles
`lukaskrivka/google-maps`	Google Maps places

Actor ID	描述
`apify/web-scraper`	通用网页抓取器
`apify/website-content-crawler`	整站内容爬取器
`apify/google-search-scraper`	Google搜索结果抓取器
`apify/instagram-scraper`	Instagram帖子/个人资料抓取器
`junglee/amazon-crawler`	亚马逊商品抓取器
`apify/twitter-scraper`	Twitter/X帖子抓取器
`apify/youtube-scraper`	YouTube视频抓取器
`apify/linkedin-scraper`	LinkedIn个人资料抓取器
`lukaskrivka/google-maps`	Google地图地点抓取器

Run Options

运行选项

Parameter	Type	Description
`timeout`	number	Run timeout in seconds
`memory`	number	Memory in MB (128, 256, 512, 1024, 2048, 4096)
`maxItems`	number	Max items to return (for sync endpoints)
`build`	string	Actor build tag (default: "latest")
`waitForFinish`	number	Wait time in seconds (for async runs)

参数	类型	描述
`timeout`	数字	运行超时时间（秒）
`memory`	数字	内存大小（MB），可选值：128、256、512、1024、2048、4096
`maxItems`	数字	最大返回条目数（仅适用于同步端点）
`build`	字符串	Actor构建标签（默认值："latest"）
`waitForFinish`	数字	等待完成时间（秒，仅适用于异步运行）

Response Format

响应格式

Run object:

json

{
  "data": {
  "id": "HG7ML7M8z78YcAPEB",
  "actId": "HDSasDasz78YcAPEB",
  "status": "SUCCEEDED",
  "startedAt": "2024-01-01T00:00:00.000Z",
  "finishedAt": "2024-01-01T00:01:00.000Z",
  "defaultDatasetId": "WkzbQMuFYuamGv3YF",
  "defaultKeyValueStoreId": "tbhFDFDh78YcAPEB"
  }
}

运行对象：

json

{
  "data": {
  "id": "HG7ML7M8z78YcAPEB",
  "actId": "HDSasDasz78YcAPEB",
  "status": "SUCCEEDED",
  "startedAt": "2024-01-01T00:00:00.000Z",
  "finishedAt": "2024-01-01T00:01:00.000Z",
  "defaultDatasetId": "WkzbQMuFYuamGv3YF",
  "defaultKeyValueStoreId": "tbhFDFDh78YcAPEB"
  }
}

Guidelines

注意事项

Sync vs Async: Use
```
run-sync-get-dataset-items
```
for quick tasks (<5 min), async for longer jobs
Rate Limits: 250,000 requests/min globally, 400/sec per resource
Memory: Higher memory = faster execution but more credits
Timeouts: Default varies by Actor; set explicit timeout for sync calls
Pagination: Use
```
limit
```
and
```
offset
```
for large datasets
Actor Input: Each Actor has different input schema - check Actor's page for details
Credits: Check usage at https://console.apify.com/billing

同步vs异步：对于快速任务（<5分钟）使用
```
run-sync-get-dataset-items
```
，长时间任务使用异步模式
请求限制：全局限制为250,000次请求/分钟，单资源限制为400次请求/秒
内存配置：内存越高执行速度越快，但消耗的积分也越多
超时设置：默认超时时间因Actor而异；同步调用请设置明确的超时时间
分页处理：针对大型数据集，使用
```
limit
```
和
```
offset
```
参数进行分页获取
Actor输入：每个Actor的输入架构不同，请查看对应Actor的页面了解详情
积分消耗：请访问https://console.apify.com/billing查看使用情况

apify

Original

Translation

Apify

Apify

When to Use

适用场景

Prerequisites

前置条件

How to Use

使用方法

1. Run an Actor (Async)

1. 异步运行Actor

2. Run Actor Synchronously

2. 同步运行Actor

3. Check Run Status

3. 检查运行状态

Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"

将{runId}替换为实际ID，例如"HG7ML7M8z78YcAPEB"

Step 1: Start an async run and capture the run ID

步骤1：启动异步运行并捕获运行ID

Step 2: Check the run status

步骤2：检查运行状态

4. Get Dataset Items

4. 获取数据集条目

Replace {datasetId} with actual ID like "WkzbQMuFYuamGv3YF"

将{datasetId}替换为实际ID，例如"WkzbQMuFYuamGv3YF"

Step 1: Start async run and capture IDs

步骤1：启动异步运行并捕获ID

Step 2: Wait for completion (poll status)

步骤2：等待任务完成（轮询状态）

Step 3: Fetch the dataset items

步骤3：获取数据集条目

Replace {datasetId} with actual ID

将{datasetId}替换为实际ID

5. Popular Actors

5. 热门Actor

Google Search Scraper

Google Search Scraper

Website Content Crawler

Website Content Crawler

Instagram Scraper

Instagram Scraper

Amazon Product Scraper

Amazon Product Scraper

6. List Your Runs

6. 列出你的运行记录

7. Abort a Run

7. 终止运行

Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"

将{runId}替换为实际ID，例如"HG7ML7M8z78YcAPEB"

Step 1: Start an async run and capture the run ID

步骤1：启动异步运行并捕获运行ID

Step 2: Abort the run

步骤2：终止运行

8. List Available Actors

8. 列出可用Actor

Popular Actors Reference

热门Actor参考

Run Options

运行选项

Response Format

响应格式

Guidelines

注意事项