firecrawl-scraper

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Firecrawl Web Scraper Skill

Firecrawl网页抓取技能

Status: Production Ready Last Updated: 2026-01-20 Official Docs: https://docs.firecrawl.dev API Version: v2 SDK Versions: firecrawl-py 4.13.0+, @mendable/firecrawl-js 4.11.1+

状态:已就绪可用于生产环境 最后更新:2026年1月20日 官方文档https://docs.firecrawl.dev API版本:v2 SDK版本:firecrawl-py 4.13.0及以上,@mendable/firecrawl-js 4.11.1及以上

What is Firecrawl?

什么是Firecrawl?

Firecrawl is a Web Data API for AI that turns websites into LLM-ready markdown or structured data. It handles:
  • JavaScript rendering - Executes client-side JavaScript to capture dynamic content
  • Anti-bot bypass - Gets past CAPTCHA and bot detection systems
  • Format conversion - Outputs as markdown, HTML, JSON, screenshots, summaries
  • Document parsing - Processes PDFs, DOCX files, and images
  • Autonomous agents - AI-powered web data gathering without URLs
  • Change tracking - Monitor content changes over time
  • Branding extraction - Extract color schemes, typography, logos

Firecrawl是一款面向AI的网页数据API,可将网站内容转换为LLM适用的Markdown格式或结构化数据。它支持:
  • JavaScript渲染 - 执行客户端JavaScript以捕获动态内容
  • 反机器人验证绕过 - 突破CAPTCHA和机器人检测系统
  • 格式转换 - 输出为Markdown、HTML、JSON、截图、摘要
  • 文档解析 - 处理PDF、DOCX文件及图片
  • 自主Agent - 无需URL,由AI驱动的网页数据收集
  • 变更追踪 - 监控内容随时间的变化
  • 品牌信息提取 - 提取配色方案、排版、Logo

API Endpoints Overview

API接口概览

EndpointPurposeUse Case
/scrape
Single pageExtract article, product page
/crawl
Full siteIndex docs, archive sites
/map
URL discoveryFind all pages, plan strategy
/search
Web search + scrapeResearch with live data
/extract
Structured dataProduct prices, contacts
/agent
Autonomous gatheringNo URLs needed, AI navigates
/batch-scrape
Multiple URLsBulk processing

接口用途适用场景
/scrape
单页面抓取提取文章、产品页面内容
/crawl
全站爬取文档索引、网站归档
/map
URL发现查找所有页面、规划爬取策略
/search
网页搜索+抓取基于实时数据的调研
/extract
结构化数据提取提取产品价格、联系信息
/agent
自主数据收集无需提供URL,由AI导航完成
/batch-scrape
多URL批量处理批量数据抓取

1. Scrape Endpoint (
/v2/scrape
)

1. 抓取接口(
/v2/scrape

Scrapes a single webpage and returns clean, structured content.
抓取单个网页并返回整洁的结构化内容。

Basic Usage

基础用法

python
from firecrawl import Firecrawl
import os

app = Firecrawl(api_key=os.environ.get("FIRECRAWL_API_KEY"))
python
from firecrawl import Firecrawl
import os

app = Firecrawl(api_key=os.environ.get("FIRECRAWL_API_KEY"))

Basic scrape

基础抓取

doc = app.scrape( url="https://example.com/article", formats=["markdown", "html"], only_main_content=True )
print(doc.markdown) print(doc.metadata)

```typescript
import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

const result = await app.scrapeUrl('https://example.com/article', {
  formats: ['markdown', 'html'],
  onlyMainContent: true
});

console.log(result.markdown);
doc = app.scrape( url="https://example.com/article", formats=["markdown", "html"], only_main_content=True )
print(doc.markdown) print(doc.metadata)

```typescript
import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

const result = await app.scrapeUrl('https://example.com/article', {
  formats: ['markdown', 'html'],
  onlyMainContent: true
});

console.log(result.markdown);

Output Formats

输出格式

FormatDescription
markdown
LLM-optimized content
html
Full HTML
rawHtml
Unprocessed HTML
screenshot
Page capture (with viewport options)
links
All URLs on page
json
Structured data extraction
summary
AI-generated summary
branding
Design system data
changeTracking
Content change detection
格式说明
markdown
针对LLM优化的内容格式
html
完整HTML内容
rawHtml
未处理的原始HTML
screenshot
页面截图(支持视口配置)
links
页面中所有URL
json
结构化数据提取结果
summary
AI生成的内容摘要
branding
设计系统数据
changeTracking
内容变更检测结果

Advanced Options

高级配置

python
doc = app.scrape(
    url="https://example.com",
    formats=["markdown", "screenshot"],
    only_main_content=True,
    remove_base64_images=True,
    wait_for=5000,  # Wait 5s for JS
    timeout=30000,
    # Location & language
    location={"country": "AU", "languages": ["en-AU"]},
    # Cache control
    max_age=0,  # Fresh content (no cache)
    store_in_cache=True,
    # Stealth mode for complex sites
    stealth=True,
    # Custom headers
    headers={"User-Agent": "Custom Bot 1.0"}
)
python
doc = app.scrape(
    url="https://example.com",
    formats=["markdown", "screenshot"],
    only_main_content=True,
    remove_base64_images=True,
    wait_for=5000,  # 等待5秒以加载JavaScript
    timeout=30000,
    # 地区与语言配置
    location={"country": "AU", "languages": ["en-AU"]},
    # 缓存控制
    max_age=0,  # 获取最新内容(不使用缓存)
    store_in_cache=True,
    # 针对复杂网站启用隐身模式
    stealth=True,
    # 自定义请求头
    headers={"User-Agent": "Custom Bot 1.0"}
)

Browser Actions

浏览器交互操作

Perform interactions before scraping:
python
doc = app.scrape(
    url="https://example.com",
    actions=[
        {"type": "click", "selector": "button.load-more"},
        {"type": "wait", "milliseconds": 2000},
        {"type": "scroll", "direction": "down"},
        {"type": "write", "selector": "input#search", "text": "query"},
        {"type": "press", "key": "Enter"},
        {"type": "screenshot"}  # Capture state mid-action
    ]
)
在抓取前执行页面交互:
python
doc = app.scrape(
    url="https://example.com",
    actions=[
        {"type": "click", "selector": "button.load-more"},
        {"type": "wait", "milliseconds": 2000},
        {"type": "scroll", "direction": "down"},
        {"type": "write", "selector": "input#search", "text": "query"},
        {"type": "press", "key": "Enter"},
        {"type": "screenshot"}  # 捕获交互过程中的页面状态
    ]
)

JSON Mode (Structured Extraction)

JSON模式(结构化提取)

python
undefined
python
undefined

With schema

带Schema的提取

doc = app.scrape( url="https://example.com/product", formats=["json"], json_options={ "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"}, "in_stock": {"type": "boolean"} } } } )
doc = app.scrape( url="https://example.com/product", formats=["json"], json_options={ "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"}, "in_stock": {"type": "boolean"} } } } )

Without schema (prompt-only)

不带Schema(仅用提示词)

doc = app.scrape( url="https://example.com/product", formats=["json"], json_options={ "prompt": "Extract the product name, price, and availability" } )
undefined
doc = app.scrape( url="https://example.com/product", formats=["json"], json_options={ "prompt": "Extract the product name, price, and availability" } )
undefined

Branding Extraction

品牌信息提取

Extract design system and brand identity:
python
doc = app.scrape(
    url="https://example.com",
    formats=["branding"]
)
提取设计系统与品牌标识:
python
doc = app.scrape(
    url="https://example.com",
    formats=["branding"]
)

Returns:

返回内容包括:

- Color schemes and palettes

- 配色方案与调色板

- Typography (fonts, sizes, weights)

- 排版信息(字体、字号、字重)

- Spacing and layout metrics

- 间距与布局规范

- UI component styles

- UI组件样式

- Logo and imagery URLs

- Logo与图片URL

- Brand personality traits

- 品牌个性特征


---

---

2. Crawl Endpoint (
/v2/crawl
)

2. 全站爬取接口(
/v2/crawl

Crawls all accessible pages from a starting URL.
python
result = app.crawl(
    url="https://docs.example.com",
    limit=100,
    max_depth=3,
    allowed_domains=["docs.example.com"],
    exclude_paths=["/api/*", "/admin/*"],
    scrape_options={
        "formats": ["markdown"],
        "only_main_content": True
    }
)

for page in result.data:
    print(f"Scraped: {page.metadata.source_url}")
    print(f"Content: {page.markdown[:200]}...")
从起始URL开始爬取所有可访问的页面。
python
result = app.crawl(
    url="https://docs.example.com",
    limit=100,
    max_depth=3,
    allowed_domains=["docs.example.com"],
    exclude_paths=["/api/*", "/admin/*"],
    scrape_options={
        "formats": ["markdown"],
        "only_main_content": True
    }
)

for page in result.data:
    print(f"已抓取: {page.metadata.source_url}")
    print(f"内容: {page.markdown[:200]}...")

Async Crawl with Webhooks

异步爬取与Webhook

python
undefined
python
undefined

Start crawl (returns immediately)

启动爬取(立即返回)

job = app.start_crawl( url="https://docs.example.com", limit=1000, webhook="https://your-domain.com/webhook" )
print(f"Job ID: {job.id}")
job = app.start_crawl( url="https://docs.example.com", limit=1000, webhook="https://your-domain.com/webhook" )
print(f"任务ID: {job.id}")

Or poll for status

或轮询任务状态

status = app.check_crawl_status(job.id)

---
status = app.check_crawl_status(job.id)

---

3. Map Endpoint (
/v2/map
)

3. URL映射接口(
/v2/map

Rapidly discover all URLs on a website without scraping content.
python
urls = app.map(url="https://example.com")

print(f"Found {len(urls)} pages")
for url in urls[:10]:
    print(url)
Use for: sitemap discovery, crawl planning, website audits.

快速发现网站上的所有URL,无需抓取内容。
python
urls = app.map(url="https://example.com")

print(f"发现 {len(urls)} 个页面")
for url in urls[:10]:
    print(url)
适用场景:站点地图发现、爬取规划、网站审计。

4. Search Endpoint (
/search
) - NEW

4. 搜索接口(
/search
)- 新增功能

Perform web searches and optionally scrape the results in one operation.
python
undefined
执行网页搜索并可选择直接抓取搜索结果。
python
undefined

Basic search

基础搜索

results = app.search( query="best practices for React server components", limit=10 )
for result in results: print(f"{result.title}: {result.url}")
results = app.search( query="best practices for React server components", limit=10 )
for result in results: print(f"{result.title}: {result.url}")

Search + scrape results

搜索+抓取结果

results = app.search( query="React server components tutorial", limit=5, scrape_options={ "formats": ["markdown"], "only_main_content": True } )
for result in results: print(f"{result.title}") print(result.markdown[:500])
undefined
results = app.search( query="React server components tutorial", limit=5, scrape_options={ "formats": ["markdown"], "only_main_content": True } )
for result in results: print(f"{result.title}") print(result.markdown[:500])
undefined

Search Options

搜索配置

python
results = app.search(
    query="machine learning papers",
    limit=20,
    # Filter by source type
    sources=["web", "news", "images"],
    # Filter by category
    categories=["github", "research", "pdf"],
    # Location
    location={"country": "US"},
    # Time filter
    tbs="qdr:m",  # Past month (qdr:h=hour, qdr:d=day, qdr:w=week, qdr:y=year)
    timeout=30000
)
Cost: 2 credits per 10 results + scraping costs if enabled.

python
results = app.search(
    query="machine learning papers",
    limit=20,
    # 按来源类型过滤
    sources=["web", "news", "images"],
    # 按分类过滤
    categories=["github", "research", "pdf"],
    # 地区配置
    location={"country": "US"},
    # 时间过滤
    tbs="qdr:m",  # 过去一个月(qdr:h=小时, qdr:d=天, qdr:w=周, qdr:y=年)
    timeout=30000
)
成本:每10条搜索结果消耗2积分,若启用抓取则额外收取抓取费用。

5. Extract Endpoint (
/v2/extract
)

5. 结构化提取接口(
/v2/extract

AI-powered structured data extraction from single pages, multiple pages, or entire domains.
基于AI的结构化数据提取,支持单页面、多页面或整个域名。

Single Page

单页面提取

python
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    description: str
    in_stock: bool

result = app.extract(
    urls=["https://example.com/product"],
    schema=Product,
    system_prompt="Extract product information"
)

print(result.data)
python
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    description: str
    in_stock: bool

result = app.extract(
    urls=["https://example.com/product"],
    schema=Product,
    system_prompt="Extract product information"
)

print(result.data)

Multi-Page / Domain Extraction

多页面/域名提取

python
undefined
python
undefined

Extract from entire domain using wildcard

通配符匹配整个域名

result = app.extract( urls=["example.com/*"], # All pages on domain schema=Product, system_prompt="Extract all products" )
result = app.extract( urls=["example.com/*"], # 域名下所有页面 schema=Product, system_prompt="Extract all products" )

Enable web search for additional context

启用网页搜索获取额外上下文

result = app.extract( urls=["example.com/products"], schema=Product, enable_web_search=True # Follow external links )
undefined
result = app.extract( urls=["example.com/products"], schema=Product, enable_web_search=True # 跟随外部链接 )
undefined

Prompt-Only Extraction (No Schema)

仅用提示词提取(无Schema)

python
result = app.extract(
    urls=["https://example.com/about"],
    prompt="Extract the company name, founding year, and key executives"
)
python
result = app.extract(
    urls=["https://example.com/about"],
    prompt="Extract the company name, founding year, and key executives"
)

LLM determines output structure

由LLM自动决定输出结构


---

---

6. Agent Endpoint (
/agent
) - NEW

6. Agent接口(
/agent
)- 新增功能

Autonomous web data gathering without requiring specific URLs. The agent searches, navigates, and gathers data using natural language prompts.
python
undefined
无需指定具体URL,实现自主网页数据收集。Agent通过自然语言提示词完成搜索、导航与数据收集。
python
undefined

Basic agent usage

基础Agent用法

result = app.agent( prompt="Find the pricing plans for the top 3 headless CMS platforms and compare their features" )
print(result.data)
result = app.agent( prompt="Find the pricing plans for the top 3 headless CMS platforms and compare their features" )
print(result.data)

With schema for structured output

带Schema的结构化输出

from pydantic import BaseModel from typing import List
class CMSPricing(BaseModel): name: str free_tier: bool starter_price: float features: List[str]
result = app.agent( prompt="Find pricing for Contentful, Sanity, and Strapi", schema=CMSPricing )
from pydantic import BaseModel from typing import List
class CMSPricing(BaseModel): name: str free_tier: bool starter_price: float features: List[str]
result = app.agent( prompt="Find pricing for Contentful, Sanity, and Strapi", schema=CMSPricing )

Optional: focus on specific URLs

可选:限定在指定URL范围内

result = app.agent( prompt="Extract the enterprise pricing details", urls=["https://contentful.com/pricing", "https://sanity.io/pricing"] )
undefined
result = app.agent( prompt="Extract the enterprise pricing details", urls=["https://contentful.com/pricing", "https://sanity.io/pricing"] )
undefined

Agent Models

Agent模型

ModelBest ForCost
spark-1-mini
(default)
Simple extractions, high volumeStandard
spark-1-pro
Complex analysis, ambiguous data60% more
python
result = app.agent(
    prompt="Analyze competitive positioning...",
    model="spark-1-pro"  # For complex tasks
)
模型适用场景成本
spark-1-mini
(默认)
简单提取、高并发场景标准定价
spark-1-pro
复杂分析、模糊数据处理高出60%
python
result = app.agent(
    prompt="Analyze competitive positioning...",
    model="spark-1-pro"  # 适用于复杂任务
)

Async Agent

异步Agent

python
undefined
python
undefined

Start agent (returns immediately)

启动Agent任务(立即返回)

job = app.start_agent( prompt="Research market trends..." )
job = app.start_agent( prompt="Research market trends..." )

Poll for results

轮询结果

status = app.check_agent_status(job.id) if status.status == "completed": print(status.data)

**Note**: Agent is in Research Preview. 5 free daily requests, then credit-based billing.

---
status = app.check_agent_status(job.id) if status.status == "completed": print(status.data)

**注意**:Agent功能处于研究预览阶段。每日免费5次请求,超出后按积分计费。

---

7. Batch Scrape - NEW

7. 批量抓取 - 新增功能

Process multiple URLs efficiently in a single operation.
在单个操作中高效处理多个URL。

Synchronous (waits for completion)

同步模式(等待完成)

python
results = app.batch_scrape(
    urls=[
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3"
    ],
    formats=["markdown"],
    only_main_content=True
)

for page in results.data:
    print(f"{page.metadata.source_url}: {len(page.markdown)} chars")
python
results = app.batch_scrape(
    urls=[
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3"
    ],
    formats=["markdown"],
    only_main_content=True
)

for page in results.data:
    print(f"{page.metadata.source_url}: {len(page.markdown)} 字符")

Asynchronous (with webhooks)

异步模式(带Webhook)

python
job = app.start_batch_scrape(
    urls=url_list,
    formats=["markdown"],
    webhook="https://your-domain.com/webhook"
)
python
job = app.start_batch_scrape(
    urls=url_list,
    formats=["markdown"],
    webhook="https://your-domain.com/webhook"
)

Webhook receives events: started, page, completed, failed

Webhook接收事件:started, page, completed, failed


```typescript
const job = await app.startBatchScrape(urls, {
  formats: ['markdown'],
  webhook: 'https://your-domain.com/webhook'
});

// Poll for status
const status = await app.checkBatchScrapeStatus(job.id);


```typescript
const job = await app.startBatchScrape(urls, {
  formats: ['markdown'],
  webhook: 'https://your-domain.com/webhook'
});

// 轮询任务状态
const status = await app.checkBatchScrapeStatus(job.id);

8. Change Tracking - NEW

8. 变更追踪 - 新增功能

Monitor content changes over time by comparing scrapes.
python
undefined
通过对比多次抓取结果,监控内容随时间的变化。
python
undefined

Enable change tracking

启用变更追踪

doc = app.scrape( url="https://example.com/pricing", formats=["markdown", "changeTracking"] )
doc = app.scrape( url="https://example.com/pricing", formats=["markdown", "changeTracking"] )

Response includes:

响应包含:

print(doc.change_tracking.status) # new, same, changed, removed print(doc.change_tracking.previous_scrape_at) print(doc.change_tracking.visibility) # visible, hidden
undefined
print(doc.change_tracking.status) # new, same, changed, removed print(doc.change_tracking.previous_scrape_at) print(doc.change_tracking.visibility) # visible, hidden
undefined

Comparison Modes

对比模式

python
undefined
python
undefined

Git-diff mode (default)

Git差异模式(默认)

doc = app.scrape( url="https://example.com/docs", formats=["markdown", "changeTracking"], change_tracking_options={ "mode": "diff" } ) print(doc.change_tracking.diff) # Line-by-line changes
doc = app.scrape( url="https://example.com/docs", formats=["markdown", "changeTracking"], change_tracking_options={ "mode": "diff" } ) print(doc.change_tracking.diff) # 逐行变更对比

JSON mode (structured comparison)

JSON模式(结构化对比)

doc = app.scrape( url="https://example.com/pricing", formats=["markdown", "changeTracking"], change_tracking_options={ "mode": "json", "schema": {"type": "object", "properties": {"price": {"type": "number"}}} } )
doc = app.scrape( url="https://example.com/pricing", formats=["markdown", "changeTracking"], change_tracking_options={ "mode": "json", "schema": {"type": "object", "properties": {"price": {"type": "number"}}} } )

Costs 5 credits per page

每页消耗5积分


**Change States**:
- `new` - Page not seen before
- `same` - No changes since last scrape
- `changed` - Content modified
- `removed` - Page no longer accessible

---

**变更状态**:
- `new` - 页面首次被抓取
- `same` - 自上次抓取后无变更
- `changed` - 内容已修改
- `removed` - 页面已无法访问

---

Authentication

身份验证

bash
undefined
bash
undefined

Store in environment

存储到环境变量中

FIRECRAWL_API_KEY=fc-your-api-key-here

**Never hardcode API keys!**

---
FIRECRAWL_API_KEY=fc-your-api-key-here

**切勿硬编码API密钥!**

---

Cloudflare Workers Integration

Cloudflare Workers集成

The Firecrawl SDK cannot run in Cloudflare Workers (requires Node.js). Use the REST API directly:
typescript
interface Env {
  FIRECRAWL_API_KEY: string;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { url } = await request.json<{ url: string }>();

    const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        url,
        formats: ['markdown'],
        onlyMainContent: true
      })
    });

    const result = await response.json();
    return Response.json(result);
  }
};

Firecrawl SDK无法在Cloudflare Workers中运行(依赖Node.js环境)。请直接使用REST API:
typescript
interface Env {
  FIRECRAWL_API_KEY: string;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { url } = await request.json<{ url: string }>();

    const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        url,
        formats: ['markdown'],
        onlyMainContent: true
      })
    });

    const result = await response.json();
    return Response.json(result);
  }
};

Rate Limits & Pricing

速率限制与定价

Warning: Stealth Mode Pricing Change (May 2025)

警告:隐身模式定价变更(2025年5月)

Stealth mode now costs 5 credits per request when actively used. Default behavior uses "auto" mode which only charges stealth credits if basic fails.
Recommended pattern:
python
undefined
隐身模式当前采用主动使用时每请求5积分的计费方式。默认的"auto"模式仅在基础抓取失败时才会启用隐身模式并收取对应积分。
推荐用法:
python
undefined

Use auto mode (default) - only charges 5 credits if stealth is needed

推荐:使用默认的auto模式 - 仅在基础抓取失败时才会启用隐身模式并收取5积分

doc = app.scrape(url, formats=["markdown"])
doc = app.scrape(url, formats=["markdown"])

Or conditionally enable stealth for specific errors

或根据错误状态条件启用

if error_status_code in [401, 403, 500]: doc = app.scrape(url, formats=["markdown"], proxy="stealth")
undefined
if error_status_code in [401, 403, 500]: doc = app.scrape(url, formats=["markdown"], proxy="stealth")
undefined

Unified Billing (November 2025)

统一计费(2025年11月)

Credits and tokens merged into single system. Extract endpoint uses credits (15 tokens = 1 credit).
积分与令牌已合并为单一系统。提取接口使用积分(15令牌=1积分)。

Pricing Tiers

定价套餐

TierCredits/MonthNotes
Free500Good for testing
Hobby3,000$19/month
Standard100,000$99/month
Growth500,000$399/month
Credit Costs:
  • Scrape: 1 credit (basic), 5 credits (stealth)
  • Crawl: 1 credit per page
  • Search: 2 credits per 10 results
  • Extract: 5 credits per page (changed from tokens in v2.6.0)
  • Agent: Dynamic (complexity-based)
  • Change Tracking JSON mode: +5 credits

套餐每月积分说明
免费版500适用于测试
爱好者版3,00019美元/月
标准版100,00099美元/月
成长版500,000399美元/月
积分消耗:
  • 抓取:1积分(基础模式),5积分(隐身模式)
  • 全站爬取:每页1积分
  • 搜索:每10条结果2积分
  • 提取:每页5积分(v2.6.0起从令牌改为积分)
  • Agent:动态计费(基于任务复杂度)
  • 变更追踪JSON模式:额外+5积分

Common Issues & Solutions

常见问题与解决方案

IssueCauseSolution
Empty contentJS not loadedAdd
wait_for: 5000
or use
actions
Rate limit exceededOver quotaCheck dashboard, upgrade plan
Timeout errorSlow pageIncrease
timeout
, use
stealth: true
Bot detectionAnti-scrapingUse
stealth: true
, add
location
Invalid API keyWrong formatMust start with
fc-

问题原因解决方案
内容为空JavaScript未加载添加
wait_for: 5000
或使用
actions
配置
超出速率限制超出配额查看控制台,升级套餐
超时错误页面加载缓慢增加
timeout
,启用
stealth: true
机器人检测反爬机制拦截使用
stealth: true
,添加
location
配置
API密钥无效格式错误密钥必须以
fc-
开头

Known Issues Prevention

已知问题预防

This skill prevents 10 documented issues:
本技能可预防10种已记录的问题:

Issue #1: Stealth Mode Pricing Change (May 2025)

问题1:隐身模式定价变更(2025年5月)

Error: Unexpected credit costs when using stealth mode Source: Stealth Mode Docs | Changelog Why It Happens: Starting May 8th, 2025, Stealth Mode proxy requests cost 5 credits per request (previously included in standard pricing). This is a significant billing change. Prevention: Use auto mode (default) which only charges stealth credits if basic fails
python
undefined
错误:使用隐身模式时产生意外的积分消耗 来源隐身模式文档 | 更新日志 原因:从2025年5月8日起,隐身模式代理请求的计费方式变为每请求5积分(此前包含在标准定价中)。这是一项重大的计费变更。 预防方案:使用默认的auto模式,仅在基础抓取失败时才会启用隐身模式并收取对应积分
python
undefined

RECOMMENDED: Use auto mode (default)

推荐:使用默认的auto模式

doc = app.scrape(url, formats=['markdown'])
doc = app.scrape(url, formats=['markdown'])

Auto retries with stealth (5 credits) only if basic fails

自动重试并仅在基础抓取失败时启用隐身模式(消耗5积分)

Or conditionally enable based on error status

或根据错误状态条件启用

try: doc = app.scrape(url, formats=['markdown'], proxy='basic') except Exception as e: if e.status_code in [401, 403, 500]: doc = app.scrape(url, formats=['markdown'], proxy='stealth')

**Stealth Mode Options**:
- `auto` (default): Charges 5 credits only if stealth succeeds after basic fails
- `basic`: Standard proxies, 1 credit cost
- `stealth`: 5 credits per request when actively used

---
try: doc = app.scrape(url, formats=['markdown'], proxy='basic') except Exception as e: if e.status_code in [401, 403, 500]: doc = app.scrape(url, formats=['markdown'], proxy='stealth')

**隐身模式选项**:
- `auto`(默认):仅在基础抓取失败且隐身模式成功时收取5积分
- `basic`:标准代理,每请求1积分
- `stealth`:主动使用时每请求5积分

---

Issue #2: v2.0.0 Breaking Changes - Method Renames

问题2:v2.0.0破坏性变更 - 方法重命名

Error:
AttributeError: 'FirecrawlApp' object has no attribute 'scrape_url'
Source: v2.0.0 Release | Migration Guide Why It Happens: v2.0.0 (August 2025) renamed SDK methods across all languages Prevention: Use new method names
JavaScript/TypeScript:
  • scrapeUrl()
    scrape()
  • crawlUrl()
    crawl()
    or
    startCrawl()
  • asyncCrawlUrl()
    startCrawl()
  • checkCrawlStatus()
    getCrawlStatus()
Python:
  • scrape_url()
    scrape()
  • crawl_url()
    crawl()
    or
    start_crawl()
python
undefined
错误
AttributeError: 'FirecrawlApp' object has no attribute 'scrape_url'
来源v2.0.0版本发布 | 迁移指南 原因:v2.0.0(2025年8月)对所有语言的SDK方法进行了重命名 预防方案:使用新的方法名
JavaScript/TypeScript:
  • scrapeUrl()
    scrape()
  • crawlUrl()
    crawl()
    startCrawl()
  • asyncCrawlUrl()
    startCrawl()
  • checkCrawlStatus()
    getCrawlStatus()
Python:
  • scrape_url()
    scrape()
  • crawl_url()
    crawl()
    start_crawl()
python
undefined

OLD (v1)

旧版(v1)

doc = app.scrape_url("https://example.com")
doc = app.scrape_url("https://example.com")

NEW (v2)

新版(v2)

doc = app.scrape("https://example.com")

---
doc = app.scrape("https://example.com")

---

Issue #3: v2.0.0 Breaking Changes - Format Changes

问题3:v2.0.0破坏性变更 - 格式变更

Error:
'extract' is not a valid format
Source: v2.0.0 Release Why It Happens: Old
"extract"
format renamed to
"json"
in v2.0.0 Prevention: Use new object format for JSON extraction
python
undefined
错误
'extract' is not a valid format
来源v2.0.0版本发布 原因:旧版的
"extract"
格式在v2.0.0中重命名为
"json"
预防方案:使用JSON提取的新对象格式
python
undefined

OLD (v1)

旧版(v1)

doc = app.scrape_url( url="https://example.com", params={ "formats": ["extract"], "extract": {"prompt": "Extract title"} } )
doc = app.scrape_url( url="https://example.com", params={ "formats": ["extract"], "extract": {"prompt": "Extract title"} } )

NEW (v2)

新版(v2)

doc = app.scrape( url="https://example.com", formats=[{"type": "json", "prompt": "Extract title"}] )
doc = app.scrape( url="https://example.com", formats=[{"type": "json", "prompt": "Extract title"}] )

With schema

带Schema的提取

doc = app.scrape( url="https://example.com", formats=[{ "type": "json", "prompt": "Extract product info", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"} } } }] )

**Screenshot format also changed**:
```python
doc = app.scrape( url="https://example.com", formats=[{ "type": "json", "prompt": "Extract product info", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"} } } }] )

**截图格式也已变更**:
```python

NEW: Screenshot as object

新版:截图配置为对象

formats=[{ "type": "screenshot", "fullPage": True, "quality": 80, "viewport": {"width": 1920, "height": 1080} }]

---
formats=[{ "type": "screenshot", "fullPage": True, "quality": 80, "viewport": {"width": 1920, "height": 1080} }]

---

Issue #4: v2.0.0 Breaking Changes - Crawl Options

问题4:v2.0.0破坏性变更 - 爬取配置项

Error:
'allowBackwardCrawling' is not a valid parameter
Source: v2.0.0 Release Why It Happens: Several crawl parameters renamed or removed in v2.0.0 Prevention: Use new parameter names
Parameter Changes:
  • allowBackwardCrawling
    → Use
    crawlEntireDomain
    instead
  • maxDepth
    → Use
    maxDiscoveryDepth
    instead
  • ignoreSitemap
    (bool) →
    sitemap
    ("only", "skip", "include")
python
undefined
错误
'allowBackwardCrawling' is not a valid parameter
来源v2.0.0版本发布 原因:v2.0.0中多个爬取配置项被重命名或移除 预防方案:使用新的配置项名称
配置项变更:
  • allowBackwardCrawling
    → 改用
    crawlEntireDomain
  • maxDepth
    → 改用
    maxDiscoveryDepth
  • ignoreSitemap
    (布尔值)→
    sitemap
    (可选值:"only", "skip", "include")
python
undefined

OLD (v1)

旧版(v1)

app.crawl_url( url="https://docs.example.com", params={ "allowBackwardCrawling": True, "maxDepth": 3, "ignoreSitemap": False } )
app.crawl_url( url="https://docs.example.com", params={ "allowBackwardCrawling": True, "maxDepth": 3, "ignoreSitemap": False } )

NEW (v2)

新版(v2)

app.crawl( url="https://docs.example.com", crawl_entire_domain=True, max_discovery_depth=3, sitemap="include" # "only", "skip", or "include" )

---
app.crawl( url="https://docs.example.com", crawl_entire_domain=True, max_discovery_depth=3, sitemap="include" # 可选值:"only", "skip", "include" )

---

Issue #5: v2.0.0 Default Behavior Changes

问题5:v2.0.0默认行为变更

Error: Stale cached content returned unexpectedly Source: v2.0.0 Release Why It Happens: v2.0.0 changed several defaults Prevention: Be aware of new defaults
Default Changes:
  • maxAge
    now defaults to 2 days (cached by default)
  • blockAds
    ,
    skipTlsVerification
    ,
    removeBase64Images
    enabled by default
python
undefined
错误:意外返回缓存的旧内容 来源v2.0.0版本发布 原因:v2.0.0修改了多项默认配置 预防方案:了解新的默认配置
默认配置变更:
  • maxAge
    现在默认值为2天(默认启用缓存)
  • blockAds
    ,
    skipTlsVerification
    ,
    removeBase64Images
    默认启用
python
undefined

Force fresh data if needed

如需强制获取最新数据

doc = app.scrape(url, formats=['markdown'], max_age=0)
doc = app.scrape(url, formats=['markdown'], max_age=0)

Disable cache entirely

完全禁用缓存

doc = app.scrape(url, formats=['markdown'], store_in_cache=False)

---
doc = app.scrape(url, formats=['markdown'], store_in_cache=False)

---

Issue #6: Job Status Race Condition

问题6:任务状态竞态条件

Error:
"Job not found"
when checking crawl status immediately after creation Source: GitHub Issue #2662 Why It Happens: Database replication delay between job creation and status endpoint availability Prevention: Wait 1-3 seconds before first status check, or implement retry logic
python
import time
错误:创建爬取任务后立即检查状态时出现
"Job not found"
来源GitHub Issue #2662 原因:任务创建与状态接口可用之间存在数据库复制延迟 预防方案:首次状态检查前等待1-3秒,或实现重试逻辑
python
import time

Start crawl

启动爬取任务

job = app.start_crawl(url="https://docs.example.com") print(f"Job ID: {job.id}")
job = app.start_crawl(url="https://docs.example.com") print(f"任务ID: {job.id}")

REQUIRED: Wait before first status check

必须:首次状态检查前等待

time.sleep(2) # 1-3 seconds recommended
time.sleep(2) # 推荐等待1-3秒

Now status check succeeds

此时状态检查可成功

status = app.get_crawl_status(job.id)
status = app.get_crawl_status(job.id)

Or implement retry logic

或实现重试逻辑

def get_status_with_retry(job_id, max_retries=3, delay=1): for attempt in range(max_retries): try: return app.get_crawl_status(job_id) except Exception as e: if "Job not found" in str(e) and attempt < max_retries - 1: time.sleep(delay) continue raise
status = get_status_with_retry(job.id)

---
def get_status_with_retry(job_id, max_retries=3, delay=1): for attempt in range(max_retries): try: return app.get_crawl_status(job_id) except Exception as e: if "Job not found" in str(e) and attempt < max_retries - 1: time.sleep(delay) continue raise
status = get_status_with_retry(job.id)

---

Issue #7: DNS Errors Return HTTP 200

问题7:DNS错误返回HTTP 200状态码

Error: DNS resolution failures return
success: false
with HTTP 200 status instead of 4xx Source: GitHub Issue #2402 | Fixed in v2.7.0 Why It Happens: Changed in v2.7.0 for consistent error handling Prevention: Check
success
field and
code
field, don't rely on HTTP status alone
typescript
const result = await app.scrape('https://nonexistent-domain-xyz.com');

// DON'T rely on HTTP status code
// Response: HTTP 200 with { success: false, code: "SCRAPE_DNS_RESOLUTION_ERROR" }

// DO check success field
if (!result.success) {
    if (result.code === 'SCRAPE_DNS_RESOLUTION_ERROR') {
        console.error('DNS resolution failed');
    }
    throw new Error(result.error);
}
Note: DNS resolution errors still charge 1 credit despite failure.

错误:DNS解析失败时返回
success: false
但HTTP状态码为200而非4xx 来源GitHub Issue #2402 | v2.7.0中已修复 原因:v2.7.0中为了统一错误处理逻辑而修改 预防方案:检查
success
字段与
code
字段,不要仅依赖HTTP状态码
typescript
const result = await app.scrape('https://nonexistent-domain-xyz.com');

// 不要仅依赖HTTP状态码
// 响应:HTTP 200,内容为{ success: false, code: "SCRAPE_DNS_RESOLUTION_ERROR" }

// 应检查success字段
if (!result.success) {
    if (result.code === 'SCRAPE_DNS_RESOLUTION_ERROR') {
        console.error('DNS解析失败');
    }
    throw new Error(result.error);
}
注意:即使DNS解析失败,仍会扣除1积分。

Issue #8: Bot Detection Still Charges Credits

问题8:机器人检测仍会扣除积分

Error: Cloudflare error page returned as "successful" scrape, credits charged Source: GitHub Issue #2413 Why It Happens: Fire-1 engine charges credits even when bot detection prevents access Prevention: Validate content isn't an error page before processing; use stealth mode for protected sites
python
undefined
错误:返回Cloudflare错误页面却被标记为“成功”抓取,扣除积分 来源GitHub Issue #2413 原因:Fire-1引擎即使在被机器人检测拦截时仍会扣除积分 预防方案:处理前验证内容是否为错误页面;对受保护网站使用隐身模式
python
undefined

First attempt without stealth

首次尝试不使用隐身模式

doc = app.scrape(url="https://protected-site.com", formats=["markdown"])
doc = app.scrape(url="https://protected-site.com", formats=["markdown"])

Validate content isn't an error page

验证内容是否为错误页面

if "cloudflare" in doc.markdown.lower() or "access denied" in doc.markdown.lower(): # Retry with stealth (costs 5 credits if successful) doc = app.scrape(url, formats=["markdown"], stealth=True)

**Cost Impact**: Basic scrape charges 1 credit even on failure, stealth retry charges additional 5 credits.

---
if "cloudflare" in doc.markdown.lower() or "access denied" in doc.markdown.lower(): # 重试时启用隐身模式(成功则消耗5积分) doc = app.scrape(url, formats=["markdown"], stealth=True)

**成本影响**:基础抓取即使失败也会扣除1积分,隐身模式重试会额外扣除5积分。

---

Issue #9: Self-Hosted Anti-Bot Fingerprinting Weakness

问题9:自部署版本反机器人指纹识别缺陷

Error:
"All scraping engines failed!"
(SCRAPE_ALL_ENGINES_FAILED) on sites with anti-bot measures Source: GitHub Issue #2257 Why It Happens: Self-hosted Firecrawl lacks advanced anti-fingerprinting techniques present in cloud service Prevention: Use Firecrawl cloud service for sites with strong anti-bot measures, or configure proxy
bash
undefined
错误:在有反机器人措施的网站上出现
"All scraping engines failed!"
(SCRAPE_ALL_ENGINES_FAILED) 来源GitHub Issue #2257 原因:自部署的Firecrawl缺少云服务中具备的高级反指纹识别技术 预防方案:对有强反机器人措施的网站使用Firecrawl云服务,或配置代理
bash
undefined

Self-hosted fails on Cloudflare-protected sites

自部署版本在Cloudflare保护的网站上失败

curl -X POST 'http://localhost:3002/v2/scrape'
-H 'Authorization: Bearer YOUR_API_KEY'
-d '{ "url": "https://www.example.com/", "pageOptions": { "engine": "playwright" } }'
curl -X POST 'http://localhost:3002/v2/scrape'
-H 'Authorization: Bearer YOUR_API_KEY'
-d '{ "url": "https://www.example.com/", "pageOptions": { "engine": "playwright" } }'

Error: "All scraping engines failed!"

错误:"All scraping engines failed!"

Workaround: Use cloud service instead

解决方案:改用云服务

Cloud service has better anti-fingerprinting

云服务具备更完善的反指纹识别能力


**Note**: This affects self-hosted v2.3.0+ with default docker-compose setup. Warning present: "⚠️ WARNING: No proxy server provided. Your IP address may be blocked."

---

**注意**:此问题影响默认docker-compose部署的自部署v2.3.0+版本。部署时会显示警告:"⚠️ WARNING: No proxy server provided. Your IP address may be blocked."

---

Issue #10: Cache Performance Best Practices (Community-sourced)

问题10:缓存性能最佳实践(社区贡献)

Suboptimal: Not leveraging cache can make requests 500% slower Source: Fast Scraping Docs | Blog Post Why It Matters: Default
maxAge
is 2 days in v2+, but many use cases need different strategies Prevention: Use appropriate cache strategy for your content type
python
undefined
优化空间:未利用缓存会导致请求速度慢500% 来源快速抓取文档 | 博客文章 原因:v2+中
maxAge
默认值为2天,但许多场景需要不同的缓存策略 预防方案:根据内容类型选择合适的缓存策略
python
undefined

Fresh data (real-time pricing, stock prices)

实时数据(如实时定价、库存)

doc = app.scrape(url, formats=["markdown"], max_age=0)
doc = app.scrape(url, formats=["markdown"], max_age=0)

10-minute cache (news, blogs)

10分钟缓存(如新闻、博客)

doc = app.scrape(url, formats=["markdown"], max_age=600000) # milliseconds
doc = app.scrape(url, formats=["markdown"], max_age=600000) # 毫秒

Use default cache (2 days) for static content

静态内容使用默认缓存(2天)

doc = app.scrape(url, formats=["markdown"]) # maxAge defaults to 172800000
doc = app.scrape(url, formats=["markdown"]) # maxAge默认值为172800000

Don't store in cache (one-time scrape)

单次抓取,不存储到缓存

doc = app.scrape(url, formats=["markdown"], store_in_cache=False)
doc = app.scrape(url, formats=["markdown"], store_in_cache=False)

Require minimum age before re-scraping (v2.7.0+)

重新抓取前需满足最小间隔(v2.7.0+)

doc = app.scrape(url, formats=["markdown"], min_age=3600000) # 1 hour minimum

**Performance Impact**:
- Cached response: Milliseconds
- Fresh scrape: Seconds
- Speed difference: **Up to 500%**

---
doc = app.scrape(url, formats=["markdown"], min_age=3600000) # 最小间隔1小时

**性能影响**:
- 缓存响应:毫秒级
- 全新抓取:秒级
- 速度差异:**最高500%**

---

Package Versions

包版本

PackageVersionLast Checked
firecrawl-py4.13.0+2026-01-20
@mendable/firecrawl-js4.11.1+2026-01-20
API Versionv2Current

版本最后检查时间
firecrawl-py4.13.0+2026-01-20
@mendable/firecrawl-js4.11.1+2026-01-20
API版本v2当前版本

Official Documentation

官方文档


Token Savings: ~65% vs manual integration Error Prevention: 10 documented issues (v2 migration, stealth pricing, job status race, DNS errors, bot detection billing, self-hosted limitations, cache optimization) Production Ready: Yes Last verified: 2026-01-21 | Skill version: 2.0.0 | Changes: Added Known Issues Prevention section with 10 documented errors from TIER 1-2 research findings; added v2 migration guidance; documented stealth mode pricing change and unified billing model

令牌节省:相比手动集成节省约65% 错误预防:10种已记录的问题(v2版本迁移、隐身模式定价、任务状态竞态、DNS错误、机器人检测计费、自部署限制、缓存优化) 生产就绪:是 最后验证:2026-01-21 | 技能版本:2.0.0 | 变更:新增已知问题预防章节,包含10种来自TIER 1-2研究的已记录错误;新增v2版本迁移指南;文档化隐身模式定价变更与统一计费模型