firecrawl

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Firecrawl

Use the Firecrawl API via direct

curl

calls to scrape websites and extract data for AI.

Official docs:
https://docs.firecrawl.dev/

通过直接调用

curl

命令使用Firecrawl API，为AI应用爬取网站并提取数据。

官方文档：
https://docs.firecrawl.dev/

When to Use

适用场景

Use this skill when you need to:

Scrape a webpage and convert to markdown/HTML
Crawl an entire website and extract all pages
Discover all URLs on a website
Search the web and get full page content
Extract structured data using AI

当你需要以下功能时，可使用该工具：

爬取单个网页并转换为markdown/HTML格式
抓取整个网站并提取所有页面内容
发现网站上的所有URL
全网搜索并获取完整页面内容
使用AI提取结构化数据

Prerequisites

前置条件

Sign up at https://www.firecrawl.dev/
Get your API key from the dashboard

bash

export FIRECRAWL_API_KEY="fc-your-api-key"

Important: When using
$VAR
in a command that pipes to another command, wrap the command containing
$VAR
in
bash -c '...'
. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

前往https://www.firecrawl.dev/注册账号
从控制台获取你的API密钥

bash

export FIRECRAWL_API_KEY="fc-your-api-key"

重要提示： 当在包含管道的命令中使用
$VAR
时，请将包含
$VAR
的命令用
bash -c '...'
包裹。由于Claude Code的bug，直接使用管道时环境变量会被自动清除。
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

How to Use

使用方法

All examples below assume you have

FIRECRAWL_API_KEY

set.

Base URL:

https://api.firecrawl.dev/v1

以下所有示例均假设你已设置好

FIRECRAWL_API_KEY

环境变量。

基础URL：

https://api.firecrawl.dev/v1

1. Scrape - Single Page

1. 单页面爬取

Extract content from a single webpage.

提取单个网页的内容。

Basic Scrape

基础爬取

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://example.com",
  "formats": ["markdown"]
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://example.com",
  "formats": ["markdown"]
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'

Scrape with Options

带选项的爬取

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://docs.example.com/api",
  "formats": ["markdown"],
  "onlyMainContent": true,
  "timeout": 30000
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.markdown'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://docs.example.com/api",
  "formats": ["markdown"],
  "onlyMainContent": true,
  "timeout": 30000
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.markdown'

Get HTML Instead

获取HTML格式内容

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://example.com",
  "formats": ["html"]
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.html'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://example.com",
  "formats": ["html"]
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.html'

Get Screenshot

获取网页截图

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://example.com",
  "formats": ["screenshot"]
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.screenshot'

Scrape Parameters:

Parameter	Type	Description
`url`	string	URL to scrape (required)
`formats`	array	`markdown` , `html` , `rawHtml` , `screenshot` , `links`
`onlyMainContent`	boolean	Skip headers/footers
`timeout`	number	Timeout in milliseconds

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://example.com",
  "formats": ["screenshot"]
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.screenshot'

爬取参数：

参数	类型	描述
`url`	string	待爬取的URL（必填）
`formats`	array	输出格式： `markdown` , `html` , `rawHtml` , `screenshot` , `links`
`onlyMainContent`	boolean	是否仅保留主内容（跳过页眉/页脚）
`timeout`	number	超时时间（毫秒）

2. Crawl - Entire Website

2. 全站抓取

Crawl all pages of a website (async operation).

抓取网站的所有页面（异步操作）。

Start a Crawl

启动抓取任务

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://example.com",
  "limit": 50,
  "maxDepth": 2
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'

Response:

json

{
  "success": true,
  "id": "crawl-job-id-here"
}

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://example.com",
  "limit": 50,
  "maxDepth": 2
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'

响应示例：

json

{
  "success": true,
  "id": "crawl-job-id-here"
}

Check Crawl Status

检查抓取状态

Replace

<job-id>

with the actual job ID returned from the crawl request:

bash

bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '{status, completed, total}'

将

<job-id>

替换为抓取任务返回的实际ID：

bash

bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '{status, completed, total}'

Get Crawl Results

获取抓取结果

Replace

<job-id>

with the actual job ID:

bash

bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '.data[] | {url: .metadata.url, title: .metadata.title}'

将

<job-id>

替换为抓取任务返回的实际ID：

bash

bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '.data[] | {url: .metadata.url, title: .metadata.title}'

Crawl with Path Filters

带路径过滤的抓取

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://blog.example.com",
  "limit": 20,
  "maxDepth": 3,
  "includePaths": ["/posts/*"],
  "excludePaths": ["/admin/*", "/login"]
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'

Crawl Parameters:

Parameter	Type	Description
`url`	string	Starting URL (required)
`limit`	number	Max pages to crawl (default: 100)
`maxDepth`	number	Max crawl depth (default: 3)
`includePaths`	array	Paths to include (e.g., `/blog/*` )
`excludePaths`	array	Paths to exclude

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://blog.example.com",
  "limit": 20,
  "maxDepth": 3,
  "includePaths": ["/posts/*"],
  "excludePaths": ["/admin/*", "/login"]
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'

抓取参数：

参数	类型	描述
`url`	string	起始URL（必填）
`limit`	number	最大抓取页面数（默认：100）
`maxDepth`	number	最大抓取深度（默认：3）
`includePaths`	array	需包含的路径（例如： `/blog/*` ）
`excludePaths`	array	需排除的路径

3. Map - URL Discovery

3. URL发现

Get all URLs from a website quickly.

快速获取网站的所有URL。

Basic Map

基础URL发现

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://example.com"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links[:10]'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://example.com"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links[:10]'

Map with Search Filter

带搜索过滤的URL发现

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://shop.example.com",
  "search": "product",
  "limit": 500
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links'

Map Parameters:

Parameter	Type	Description
`url`	string	Website URL (required)
`search`	string	Filter URLs containing keyword
`limit`	number	Max URLs to return (default: 1000)

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://shop.example.com",
  "search": "product",
  "limit": 500
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links'

URL发现参数：

参数	类型	描述
`url`	string	目标网站URL（必填）
`search`	string	过滤包含指定关键词的URL
`limit`	number	返回的最大URL数量（默认：1000）

4. Search - Web Search

4. 网页搜索

Search the web and get full page content.

搜索全网并获取完整页面内容。

Basic Search

基础搜索

Write to

/tmp/firecrawl_request.json

json

{
  "query": "AI news 2024",
  "limit": 5
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "query": "AI news 2024",
  "limit": 5
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'

Search with Full Content

获取完整内容的搜索

Write to

/tmp/firecrawl_request.json

json

{
  "query": "machine learning tutorials",
  "limit": 3,
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'

Search Parameters:

Parameter	Type	Description
`query`	string	Search query (required)
`limit`	number	Number of results (default: 10)
`scrapeOptions`	object	Options for scraping results

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "query": "machine learning tutorials",
  "limit": 3,
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'

搜索参数：

参数	类型	描述
`query`	string	搜索关键词（必填）
`limit`	number	返回结果数量（默认：10）
`scrapeOptions`	object	爬取结果的配置选项

5. Extract - AI Data Extraction

5. AI数据提取

Extract structured data from pages using AI.

使用AI从页面中提取结构化数据。

Basic Extract

基础提取

Write to

/tmp/firecrawl_request.json

json

{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and description"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and description"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

Extract with Schema

带Schema的提取

Write to

/tmp/firecrawl_request.json

json

{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract product information",
  "schema": {
    "type": "object",
    "properties": {
      "name": {"type": "string"},
      "price": {"type": "number"},
      "currency": {"type": "string"},
      "inStock": {"type": "boolean"}
    }
  }
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract product information",
  "schema": {
    "type": "object",
    "properties": {
      "name": {"type": "string"},
      "price": {"type": "number"},
      "currency": {"type": "string"},
      "inStock": {"type": "boolean"}
    }
  }
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

Extract from Multiple URLs

多URL批量提取

Write to

/tmp/firecrawl_request.json

json

{
  "urls": [
    "https://example.com/product/1",
    "https://example.com/product/2"
  ],
  "prompt": "Extract product name and price"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

Extract Parameters:

Parameter	Type	Description
`urls`	array	URLs to extract from (required)
`prompt`	string	Description of data to extract (required)
`schema`	object	JSON schema for structured output

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "urls": [
    "https://example.com/product/1",
    "https://example.com/product/2"
  ],
  "prompt": "Extract product name and price"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

提取参数：

参数	类型	描述
`urls`	array	待提取数据的URL列表（必填）
`prompt`	string	提取数据的描述要求（必填）
`schema`	object	结构化输出的JSON Schema

Practical Examples

实用示例

Scrape Documentation

爬取文档内容

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://docs.python.org/3/tutorial/",
  "formats": ["markdown"],
  "onlyMainContent": true
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.data.markdown' > python-tutorial.md

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://docs.python.org/3/tutorial/",
  "formats": ["markdown"],
  "onlyMainContent": true
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.data.markdown' > python-tutorial.md

Find All Blog Posts

查找所有博客文章

Write to

/tmp/firecrawl_request.json

json

{
  "url": "https://blog.example.com",
  "search": "post"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.links[]'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "url": "https://blog.example.com",
  "search": "post"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.links[]'

Research a Topic

主题调研

Write to

/tmp/firecrawl_request.json

json

{
  "query": "best practices REST API design 2024",
  "limit": 5,
  "scrapeOptions": {"formats": ["markdown"]}
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "query": "best practices REST API design 2024",
  "limit": 5,
  "scrapeOptions": {"formats": ["markdown"]}
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'

Extract Pricing Data

提取定价数据

Write to

/tmp/firecrawl_request.json

json

{
  "urls": ["https://example.com/pricing"],
  "prompt": "Extract all pricing tiers with name, price, and features"
}

Then run:

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

将以下内容写入

/tmp/firecrawl_request.json

：

json

{
  "urls": ["https://example.com/pricing"],
  "prompt": "Extract all pricing tiers with name, price, and features"
}

然后运行：

bash

bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

Poll Crawl Until Complete

轮询抓取任务直至完成

Replace

<job-id>

with the actual job ID:

bash

while true; do
  STATUS="$(bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq -r '.status')"
  echo "Status: $STATUS"
  [ "$STATUS" = "completed" ] && break
  sleep 5
done

将

<job-id>

替换为实际的任务ID：

bash

while true; do
  STATUS="$(bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq -r '.status')"
  echo "Status: $STATUS"
  [ "$STATUS" = "completed" ] && break
  sleep 5
done

Response Format

响应格式

Scrape Response

爬取响应

json

{
  "success": true,
  "data": {
  "markdown": "# Page Title\n\nContent...",
  "metadata": {
  "title": "Page Title",
  "description": "...",
  "url": "https://..."
  }
  }
}

json

{
  "success": true,
  "data": {
  "markdown": "# Page Title\n\nContent...",
  "metadata": {
  "title": "Page Title",
  "description": "...",
  "url": "https://..."
  }
  }
}

Crawl Status Response

抓取状态响应

json

{
  "success": true,
  "status": "completed",
  "completed": 50,
  "total": 50,
  "data": [...]
}

json

{
  "success": true,
  "status": "completed",
  "completed": 50,
  "total": 50,
  "data": [...]
}

Guidelines

注意事项

Rate limits: Add delays between requests to avoid 429 errors
Crawl limits: Set reasonable
```
limit
```
values to control API usage
Main content: Use
```
onlyMainContent: true
```
for cleaner output
Async crawls: Large crawls are async; poll
```
/crawl/{id}
```
for status
Extract prompts: Be specific for better AI extraction results
Check success: Always check
```
success
```
field in responses

速率限制：在请求之间添加延迟，避免429错误
抓取限制：设置合理的
```
limit
```
值，控制API使用量
主内容过滤：使用
```
onlyMainContent: true
```
获取更简洁的输出
异步抓取：大型抓取任务为异步执行，需轮询
```
/crawl/{id}
```
获取状态
提取提示：提示语需尽可能具体，以获得更好的AI提取结果
成功状态检查：始终检查响应中的
```
success
```
字段