firecrawl

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Firecrawl

Firecrawl

Use the Firecrawl API via direct
curl
calls to scrape websites and extract data for AI.
Official docs:
https://docs.firecrawl.dev/

通过直接调用
curl
命令使用Firecrawl API,为AI应用爬取网站并提取数据
官方文档:
https://docs.firecrawl.dev/

When to Use

适用场景

Use this skill when you need to:
  • Scrape a webpage and convert to markdown/HTML
  • Crawl an entire website and extract all pages
  • Discover all URLs on a website
  • Search the web and get full page content
  • Extract structured data using AI

当你需要以下功能时,可使用该工具:
  • 爬取单个网页并转换为markdown/HTML格式
  • 抓取整个网站并提取所有页面内容
  • 发现网站上的所有URL
  • 全网搜索并获取完整页面内容
  • 使用AI提取结构化数据

Prerequisites

前置条件

  1. Sign up at https://www.firecrawl.dev/
  2. Get your API key from the dashboard
bash
export FIRECRAWL_API_KEY="fc-your-api-key"

Important: When using
$VAR
in a command that pipes to another command, wrap the command containing
$VAR
in
bash -c '...'
. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
  1. 前往https://www.firecrawl.dev/注册账号
  2. 从控制台获取你的API密钥
bash
export FIRECRAWL_API_KEY="fc-your-api-key"

重要提示: 当在包含管道的命令中使用
$VAR
时,请将包含
$VAR
的命令用
bash -c '...'
包裹。由于Claude Code的bug,直接使用管道时环境变量会被自动清除。
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

How to Use

使用方法

All examples below assume you have
FIRECRAWL_API_KEY
set.
Base URL:
https://api.firecrawl.dev/v1

以下所有示例均假设你已设置好
FIRECRAWL_API_KEY
环境变量。
基础URL:
https://api.firecrawl.dev/v1

1. Scrape - Single Page

1. 单页面爬取

Extract content from a single webpage.
提取单个网页的内容。

Basic Scrape

基础爬取

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://example.com",
  "formats": ["markdown"]
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://example.com",
  "formats": ["markdown"]
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'

Scrape with Options

带选项的爬取

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://docs.example.com/api",
  "formats": ["markdown"],
  "onlyMainContent": true,
  "timeout": 30000
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.markdown'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://docs.example.com/api",
  "formats": ["markdown"],
  "onlyMainContent": true,
  "timeout": 30000
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.markdown'

Get HTML Instead

获取HTML格式内容

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://example.com",
  "formats": ["html"]
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.html'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://example.com",
  "formats": ["html"]
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.html'

Get Screenshot

获取网页截图

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://example.com",
  "formats": ["screenshot"]
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.screenshot'
Scrape Parameters:
ParameterTypeDescription
url
stringURL to scrape (required)
formats
array
markdown
,
html
,
rawHtml
,
screenshot
,
links
onlyMainContent
booleanSkip headers/footers
timeout
numberTimeout in milliseconds

将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://example.com",
  "formats": ["screenshot"]
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.screenshot'
爬取参数:
参数类型描述
url
string待爬取的URL(必填)
formats
array输出格式:
markdown
,
html
,
rawHtml
,
screenshot
,
links
onlyMainContent
boolean是否仅保留主内容(跳过页眉/页脚)
timeout
number超时时间(毫秒)

2. Crawl - Entire Website

2. 全站抓取

Crawl all pages of a website (async operation).
抓取网站的所有页面(异步操作)。

Start a Crawl

启动抓取任务

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://example.com",
  "limit": 50,
  "maxDepth": 2
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
Response:
json
{
  "success": true,
  "id": "crawl-job-id-here"
}
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://example.com",
  "limit": 50,
  "maxDepth": 2
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
响应示例:
json
{
  "success": true,
  "id": "crawl-job-id-here"
}

Check Crawl Status

检查抓取状态

Replace
<job-id>
with the actual job ID returned from the crawl request:
bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '{status, completed, total}'
<job-id>
替换为抓取任务返回的实际ID:
bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '{status, completed, total}'

Get Crawl Results

获取抓取结果

Replace
<job-id>
with the actual job ID:
bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '.data[] | {url: .metadata.url, title: .metadata.title}'
<job-id>
替换为抓取任务返回的实际ID:
bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '.data[] | {url: .metadata.url, title: .metadata.title}'

Crawl with Path Filters

带路径过滤的抓取

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://blog.example.com",
  "limit": 20,
  "maxDepth": 3,
  "includePaths": ["/posts/*"],
  "excludePaths": ["/admin/*", "/login"]
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
Crawl Parameters:
ParameterTypeDescription
url
stringStarting URL (required)
limit
numberMax pages to crawl (default: 100)
maxDepth
numberMax crawl depth (default: 3)
includePaths
arrayPaths to include (e.g.,
/blog/*
)
excludePaths
arrayPaths to exclude

将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://blog.example.com",
  "limit": 20,
  "maxDepth": 3,
  "includePaths": ["/posts/*"],
  "excludePaths": ["/admin/*", "/login"]
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
抓取参数:
参数类型描述
url
string起始URL(必填)
limit
number最大抓取页面数(默认:100)
maxDepth
number最大抓取深度(默认:3)
includePaths
array需包含的路径(例如:
/blog/*
excludePaths
array需排除的路径

3. Map - URL Discovery

3. URL发现

Get all URLs from a website quickly.
快速获取网站的所有URL。

Basic Map

基础URL发现

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://example.com"
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links[:10]'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://example.com"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links[:10]'

Map with Search Filter

带搜索过滤的URL发现

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://shop.example.com",
  "search": "product",
  "limit": 500
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links'
Map Parameters:
ParameterTypeDescription
url
stringWebsite URL (required)
search
stringFilter URLs containing keyword
limit
numberMax URLs to return (default: 1000)

将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://shop.example.com",
  "search": "product",
  "limit": 500
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links'
URL发现参数:
参数类型描述
url
string目标网站URL(必填)
search
string过滤包含指定关键词的URL
limit
number返回的最大URL数量(默认:1000)

4. Search - Web Search

4. 网页搜索

Search the web and get full page content.
搜索全网并获取完整页面内容。

Basic Search

基础搜索

Write to
/tmp/firecrawl_request.json
:
json
{
  "query": "AI news 2024",
  "limit": 5
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "query": "AI news 2024",
  "limit": 5
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'

Search with Full Content

获取完整内容的搜索

Write to
/tmp/firecrawl_request.json
:
json
{
  "query": "machine learning tutorials",
  "limit": 3,
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'
Search Parameters:
ParameterTypeDescription
query
stringSearch query (required)
limit
numberNumber of results (default: 10)
scrapeOptions
objectOptions for scraping results

将以下内容写入
/tmp/firecrawl_request.json
json
{
  "query": "machine learning tutorials",
  "limit": 3,
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'
搜索参数:
参数类型描述
query
string搜索关键词(必填)
limit
number返回结果数量(默认:10)
scrapeOptions
object爬取结果的配置选项

5. Extract - AI Data Extraction

5. AI数据提取

Extract structured data from pages using AI.
使用AI从页面中提取结构化数据。

Basic Extract

基础提取

Write to
/tmp/firecrawl_request.json
:
json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and description"
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and description"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

Extract with Schema

带Schema的提取

Write to
/tmp/firecrawl_request.json
:
json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract product information",
  "schema": {
    "type": "object",
    "properties": {
      "name": {"type": "string"},
      "price": {"type": "number"},
      "currency": {"type": "string"},
      "inStock": {"type": "boolean"}
    }
  }
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract product information",
  "schema": {
    "type": "object",
    "properties": {
      "name": {"type": "string"},
      "price": {"type": "number"},
      "currency": {"type": "string"},
      "inStock": {"type": "boolean"}
    }
  }
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

Extract from Multiple URLs

多URL批量提取

Write to
/tmp/firecrawl_request.json
:
json
{
  "urls": [
    "https://example.com/product/1",
    "https://example.com/product/2"
  ],
  "prompt": "Extract product name and price"
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
Extract Parameters:
ParameterTypeDescription
urls
arrayURLs to extract from (required)
prompt
stringDescription of data to extract (required)
schema
objectJSON schema for structured output

将以下内容写入
/tmp/firecrawl_request.json
json
{
  "urls": [
    "https://example.com/product/1",
    "https://example.com/product/2"
  ],
  "prompt": "Extract product name and price"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
提取参数:
参数类型描述
urls
array待提取数据的URL列表(必填)
prompt
string提取数据的描述要求(必填)
schema
object结构化输出的JSON Schema

Practical Examples

实用示例

Scrape Documentation

爬取文档内容

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://docs.python.org/3/tutorial/",
  "formats": ["markdown"],
  "onlyMainContent": true
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.data.markdown' > python-tutorial.md
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://docs.python.org/3/tutorial/",
  "formats": ["markdown"],
  "onlyMainContent": true
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.data.markdown' > python-tutorial.md

Find All Blog Posts

查找所有博客文章

Write to
/tmp/firecrawl_request.json
:
json
{
  "url": "https://blog.example.com",
  "search": "post"
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.links[]'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "url": "https://blog.example.com",
  "search": "post"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.links[]'

Research a Topic

主题调研

Write to
/tmp/firecrawl_request.json
:
json
{
  "query": "best practices REST API design 2024",
  "limit": 5,
  "scrapeOptions": {"formats": ["markdown"]}
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "query": "best practices REST API design 2024",
  "limit": 5,
  "scrapeOptions": {"formats": ["markdown"]}
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'

Extract Pricing Data

提取定价数据

Write to
/tmp/firecrawl_request.json
:
json
{
  "urls": ["https://example.com/pricing"],
  "prompt": "Extract all pricing tiers with name, price, and features"
}
Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
将以下内容写入
/tmp/firecrawl_request.json
json
{
  "urls": ["https://example.com/pricing"],
  "prompt": "Extract all pricing tiers with name, price, and features"
}
然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'

Poll Crawl Until Complete

轮询抓取任务直至完成

Replace
<job-id>
with the actual job ID:
bash
while true; do
  STATUS="$(bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq -r '.status')"
  echo "Status: $STATUS"
  [ "$STATUS" = "completed" ] && break
  sleep 5
done

<job-id>
替换为实际的任务ID:
bash
while true; do
  STATUS="$(bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq -r '.status')"
  echo "Status: $STATUS"
  [ "$STATUS" = "completed" ] && break
  sleep 5
done

Response Format

响应格式

Scrape Response

爬取响应

json
{
  "success": true,
  "data": {
  "markdown": "# Page Title\n\nContent...",
  "metadata": {
  "title": "Page Title",
  "description": "...",
  "url": "https://..."
  }
  }
}
json
{
  "success": true,
  "data": {
  "markdown": "# Page Title\n\nContent...",
  "metadata": {
  "title": "Page Title",
  "description": "...",
  "url": "https://..."
  }
  }
}

Crawl Status Response

抓取状态响应

json
{
  "success": true,
  "status": "completed",
  "completed": 50,
  "total": 50,
  "data": [...]
}

json
{
  "success": true,
  "status": "completed",
  "completed": 50,
  "total": 50,
  "data": [...]
}

Guidelines

注意事项

  1. Rate limits: Add delays between requests to avoid 429 errors
  2. Crawl limits: Set reasonable
    limit
    values to control API usage
  3. Main content: Use
    onlyMainContent: true
    for cleaner output
  4. Async crawls: Large crawls are async; poll
    /crawl/{id}
    for status
  5. Extract prompts: Be specific for better AI extraction results
  6. Check success: Always check
    success
    field in responses
  1. 速率限制:在请求之间添加延迟,避免429错误
  2. 抓取限制:设置合理的
    limit
    值,控制API使用量
  3. 主内容过滤:使用
    onlyMainContent: true
    获取更简洁的输出
  4. 异步抓取:大型抓取任务为异步执行,需轮询
    /crawl/{id}
    获取状态
  5. 提取提示:提示语需尽可能具体,以获得更好的AI提取结果
  6. 成功状态检查:始终检查响应中的
    success
    字段