firecrawl
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFirecrawl
Firecrawl
Use the Firecrawl API via direct calls to scrape websites and extract data for AI.
curlOfficial docs:https://docs.firecrawl.dev/
通过直接调用命令使用Firecrawl API,为AI应用爬取网站并提取数据。
curl官方文档:https://docs.firecrawl.dev/
When to Use
适用场景
Use this skill when you need to:
- Scrape a webpage and convert to markdown/HTML
- Crawl an entire website and extract all pages
- Discover all URLs on a website
- Search the web and get full page content
- Extract structured data using AI
当你需要以下功能时,可使用该工具:
- 爬取单个网页并转换为markdown/HTML格式
- 抓取整个网站并提取所有页面内容
- 发现网站上的所有URL
- 全网搜索并获取完整页面内容
- 使用AI提取结构化数据
Prerequisites
前置条件
- Sign up at https://www.firecrawl.dev/
- Get your API key from the dashboard
bash
export FIRECRAWL_API_KEY="fc-your-api-key"Important: When usingin a command that pipes to another command, wrap the command containing$VARin$VAR. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.bash -c '...'bashbash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
- 前往https://www.firecrawl.dev/注册账号
- 从控制台获取你的API密钥
bash
export FIRECRAWL_API_KEY="fc-your-api-key"重要提示: 当在包含管道的命令中使用时,请将包含$VAR的命令用$VAR包裹。由于Claude Code的bug,直接使用管道时环境变量会被自动清除。bash -c '...'bashbash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
How to Use
使用方法
All examples below assume you have set.
FIRECRAWL_API_KEYBase URL:
https://api.firecrawl.dev/v1以下所有示例均假设你已设置好环境变量。
FIRECRAWL_API_KEY基础URL:
https://api.firecrawl.dev/v11. Scrape - Single Page
1. 单页面爬取
Extract content from a single webpage.
提取单个网页的内容。
Basic Scrape
基础爬取
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"formats": ["markdown"]
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"formats": ["markdown"]
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'Scrape with Options
带选项的爬取
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://docs.example.com/api",
"formats": ["markdown"],
"onlyMainContent": true,
"timeout": 30000
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.markdown'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://docs.example.com/api",
"formats": ["markdown"],
"onlyMainContent": true,
"timeout": 30000
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.markdown'Get HTML Instead
获取HTML格式内容
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"formats": ["html"]
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.html'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"formats": ["html"]
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.html'Get Screenshot
获取网页截图
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"formats": ["screenshot"]
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.screenshot'Scrape Parameters:
| Parameter | Type | Description |
|---|---|---|
| string | URL to scrape (required) |
| array | |
| boolean | Skip headers/footers |
| number | Timeout in milliseconds |
将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"formats": ["screenshot"]
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.screenshot'爬取参数:
| 参数 | 类型 | 描述 |
|---|---|---|
| string | 待爬取的URL(必填) |
| array | 输出格式: |
| boolean | 是否仅保留主内容(跳过页眉/页脚) |
| number | 超时时间(毫秒) |
2. Crawl - Entire Website
2. 全站抓取
Crawl all pages of a website (async operation).
抓取网站的所有页面(异步操作)。
Start a Crawl
启动抓取任务
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"limit": 50,
"maxDepth": 2
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'Response:
json
{
"success": true,
"id": "crawl-job-id-here"
}将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com",
"limit": 50,
"maxDepth": 2
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'响应示例:
json
{
"success": true,
"id": "crawl-job-id-here"
}Check Crawl Status
检查抓取状态
Replace with the actual job ID returned from the crawl request:
<job-id>bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '{status, completed, total}'将替换为抓取任务返回的实际ID:
<job-id>bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '{status, completed, total}'Get Crawl Results
获取抓取结果
Replace with the actual job ID:
<job-id>bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '.data[] | {url: .metadata.url, title: .metadata.title}'将替换为抓取任务返回的实际ID:
<job-id>bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '.data[] | {url: .metadata.url, title: .metadata.title}'Crawl with Path Filters
带路径过滤的抓取
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://blog.example.com",
"limit": 20,
"maxDepth": 3,
"includePaths": ["/posts/*"],
"excludePaths": ["/admin/*", "/login"]
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'Crawl Parameters:
| Parameter | Type | Description |
|---|---|---|
| string | Starting URL (required) |
| number | Max pages to crawl (default: 100) |
| number | Max crawl depth (default: 3) |
| array | Paths to include (e.g., |
| array | Paths to exclude |
将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://blog.example.com",
"limit": 20,
"maxDepth": 3,
"includePaths": ["/posts/*"],
"excludePaths": ["/admin/*", "/login"]
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'抓取参数:
| 参数 | 类型 | 描述 |
|---|---|---|
| string | 起始URL(必填) |
| number | 最大抓取页面数(默认:100) |
| number | 最大抓取深度(默认:3) |
| array | 需包含的路径(例如: |
| array | 需排除的路径 |
3. Map - URL Discovery
3. URL发现
Get all URLs from a website quickly.
快速获取网站的所有URL。
Basic Map
基础URL发现
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com"
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links[:10]'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://example.com"
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links[:10]'Map with Search Filter
带搜索过滤的URL发现
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://shop.example.com",
"search": "product",
"limit": 500
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links'Map Parameters:
| Parameter | Type | Description |
|---|---|---|
| string | Website URL (required) |
| string | Filter URLs containing keyword |
| number | Max URLs to return (default: 1000) |
将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://shop.example.com",
"search": "product",
"limit": 500
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links'URL发现参数:
| 参数 | 类型 | 描述 |
|---|---|---|
| string | 目标网站URL(必填) |
| string | 过滤包含指定关键词的URL |
| number | 返回的最大URL数量(默认:1000) |
4. Search - Web Search
4. 网页搜索
Search the web and get full page content.
搜索全网并获取完整页面内容。
Basic Search
基础搜索
Write to :
/tmp/firecrawl_request.jsonjson
{
"query": "AI news 2024",
"limit": 5
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"query": "AI news 2024",
"limit": 5
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'Search with Full Content
获取完整内容的搜索
Write to :
/tmp/firecrawl_request.jsonjson
{
"query": "machine learning tutorials",
"limit": 3,
"scrapeOptions": {
"formats": ["markdown"]
}
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'Search Parameters:
| Parameter | Type | Description |
|---|---|---|
| string | Search query (required) |
| number | Number of results (default: 10) |
| object | Options for scraping results |
将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"query": "machine learning tutorials",
"limit": 3,
"scrapeOptions": {
"formats": ["markdown"]
}
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'搜索参数:
| 参数 | 类型 | 描述 |
|---|---|---|
| string | 搜索关键词(必填) |
| number | 返回结果数量(默认:10) |
| object | 爬取结果的配置选项 |
5. Extract - AI Data Extraction
5. AI数据提取
Extract structured data from pages using AI.
使用AI从页面中提取结构化数据。
Basic Extract
基础提取
Write to :
/tmp/firecrawl_request.jsonjson
{
"urls": ["https://example.com/product/123"],
"prompt": "Extract the product name, price, and description"
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"urls": ["https://example.com/product/123"],
"prompt": "Extract the product name, price, and description"
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'Extract with Schema
带Schema的提取
Write to :
/tmp/firecrawl_request.jsonjson
{
"urls": ["https://example.com/product/123"],
"prompt": "Extract product information",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"currency": {"type": "string"},
"inStock": {"type": "boolean"}
}
}
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"urls": ["https://example.com/product/123"],
"prompt": "Extract product information",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"currency": {"type": "string"},
"inStock": {"type": "boolean"}
}
}
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'Extract from Multiple URLs
多URL批量提取
Write to :
/tmp/firecrawl_request.jsonjson
{
"urls": [
"https://example.com/product/1",
"https://example.com/product/2"
],
"prompt": "Extract product name and price"
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'Extract Parameters:
| Parameter | Type | Description |
|---|---|---|
| array | URLs to extract from (required) |
| string | Description of data to extract (required) |
| object | JSON schema for structured output |
将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"urls": [
"https://example.com/product/1",
"https://example.com/product/2"
],
"prompt": "Extract product name and price"
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'提取参数:
| 参数 | 类型 | 描述 |
|---|---|---|
| array | 待提取数据的URL列表(必填) |
| string | 提取数据的描述要求(必填) |
| object | 结构化输出的JSON Schema |
Practical Examples
实用示例
Scrape Documentation
爬取文档内容
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://docs.python.org/3/tutorial/",
"formats": ["markdown"],
"onlyMainContent": true
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.data.markdown' > python-tutorial.md将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://docs.python.org/3/tutorial/",
"formats": ["markdown"],
"onlyMainContent": true
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.data.markdown' > python-tutorial.mdFind All Blog Posts
查找所有博客文章
Write to :
/tmp/firecrawl_request.jsonjson
{
"url": "https://blog.example.com",
"search": "post"
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.links[]'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"url": "https://blog.example.com",
"search": "post"
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.links[]'Research a Topic
主题调研
Write to :
/tmp/firecrawl_request.jsonjson
{
"query": "best practices REST API design 2024",
"limit": 5,
"scrapeOptions": {"formats": ["markdown"]}
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"query": "best practices REST API design 2024",
"limit": 5,
"scrapeOptions": {"formats": ["markdown"]}
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'Extract Pricing Data
提取定价数据
Write to :
/tmp/firecrawl_request.jsonjson
{
"urls": ["https://example.com/pricing"],
"prompt": "Extract all pricing tiers with name, price, and features"
}Then run:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'将以下内容写入:
/tmp/firecrawl_request.jsonjson
{
"urls": ["https://example.com/pricing"],
"prompt": "Extract all pricing tiers with name, price, and features"
}然后运行:
bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'Poll Crawl Until Complete
轮询抓取任务直至完成
Replace with the actual job ID:
<job-id>bash
while true; do
STATUS="$(bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq -r '.status')"
echo "Status: $STATUS"
[ "$STATUS" = "completed" ] && break
sleep 5
done将替换为实际的任务ID:
<job-id>bash
while true; do
STATUS="$(bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq -r '.status')"
echo "Status: $STATUS"
[ "$STATUS" = "completed" ] && break
sleep 5
doneResponse Format
响应格式
Scrape Response
爬取响应
json
{
"success": true,
"data": {
"markdown": "# Page Title\n\nContent...",
"metadata": {
"title": "Page Title",
"description": "...",
"url": "https://..."
}
}
}json
{
"success": true,
"data": {
"markdown": "# Page Title\n\nContent...",
"metadata": {
"title": "Page Title",
"description": "...",
"url": "https://..."
}
}
}Crawl Status Response
抓取状态响应
json
{
"success": true,
"status": "completed",
"completed": 50,
"total": 50,
"data": [...]
}json
{
"success": true,
"status": "completed",
"completed": 50,
"total": 50,
"data": [...]
}Guidelines
注意事项
- Rate limits: Add delays between requests to avoid 429 errors
- Crawl limits: Set reasonable values to control API usage
limit - Main content: Use for cleaner output
onlyMainContent: true - Async crawls: Large crawls are async; poll for status
/crawl/{id} - Extract prompts: Be specific for better AI extraction results
- Check success: Always check field in responses
success
- 速率限制:在请求之间添加延迟,避免429错误
- 抓取限制:设置合理的值,控制API使用量
limit - 主内容过滤:使用获取更简洁的输出
onlyMainContent: true - 异步抓取:大型抓取任务为异步执行,需轮询获取状态
/crawl/{id} - 提取提示:提示语需尽可能具体,以获得更好的AI提取结果
- 成功状态检查:始终检查响应中的字段
success