firecrawl-scraper

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Firecrawl Web Scraper Skill

Firecrawl 网页抓取工具技能文档

Status: Production Ready ✅ Last Updated: 2025-10-24 Official Docs: https://docs.firecrawl.dev API Version: v2

状态:已就绪可投入生产 ✅ 最后更新:2025-10-24 官方文档https://docs.firecrawl.dev API版本:v2

What is Firecrawl?

什么是Firecrawl?

Firecrawl is a Web Data API for AI that turns entire websites into LLM-ready markdown or structured data. It handles:
  • JavaScript rendering - Executes client-side JavaScript to capture dynamic content
  • Anti-bot bypass - Gets past CAPTCHA and bot detection systems
  • Format conversion - Outputs as markdown, JSON, or structured data
  • Screenshot capture - Saves visual representations of pages
  • Browser automation - Full headless browser capabilities

Firecrawl 是一款面向AI的网页数据API,可将整个网站内容转换为适用于LLM的Markdown或结构化数据。它支持:
  • JavaScript渲染 - 执行客户端JavaScript以捕获动态内容
  • 反机器人绕过 - 突破验证码和机器人检测系统
  • 格式转换 - 输出为Markdown、JSON或结构化数据
  • 截图捕获 - 保存页面的可视化内容
  • 浏览器自动化 - 完整的无头浏览器功能

API Endpoints

API端点

1.
/v2/scrape
- Single Page Scraping

1.
/v2/scrape
- 单页抓取

Scrapes a single webpage and returns clean, structured content.
Use Cases:
  • Extract article content
  • Get product details
  • Scrape specific pages
  • Convert HTML to markdown
Key Options:
  • formats
    : ["markdown", "html", "screenshot"]
  • onlyMainContent
    : true/false (removes nav, footer, ads)
  • waitFor
    : milliseconds to wait before scraping
  • actions
    : browser automation actions (click, scroll, etc.)
抓取单个网页并返回干净的结构化内容。
适用场景:
  • 提取文章内容
  • 获取产品详情
  • 抓取特定页面
  • 将HTML转换为Markdown
关键参数选项:
  • formats
    : ["markdown", "html", "screenshot"]
  • onlyMainContent
    : true/false(移除导航、页脚、广告)
  • waitFor
    : 抓取前等待的毫秒数
  • actions
    : 浏览器自动化操作(点击、滚动等)

2.
/v2/crawl
- Full Site Crawling

2.
/v2/crawl
- 全站爬取

Crawls all accessible pages from a starting URL.
Use Cases:
  • Index entire documentation sites
  • Archive website content
  • Build knowledge bases
  • Scrape multi-page content
Key Options:
  • limit
    : max pages to crawl
  • maxDepth
    : how many links deep to follow
  • allowedDomains
    : restrict to specific domains
  • excludePaths
    : skip certain URL patterns
从起始URL爬取所有可访问的页面。
适用场景:
  • 索引整个文档站点
  • 归档网站内容
  • 构建知识库
  • 抓取多页面内容
关键参数选项:
  • limit
    : 最大爬取页面数
  • maxDepth
    : 跟随链接的深度
  • allowedDomains
    : 限制爬取的特定域名
  • excludePaths
    : 跳过特定URL模式

3.
/v2/map
- URL Discovery

3.
/v2/map
- URL发现

Maps all URLs on a website without scraping content.
Use Cases:
  • Find sitemap
  • Discover all pages
  • Plan crawling strategy
  • Audit website structure
映射网站上的所有URL而不抓取内容。
适用场景:
  • 查找站点地图
  • 发现所有页面
  • 规划爬取策略
  • 审计网站结构

4.
/v2/extract
- Structured Data Extraction

4.
/v2/extract
- 结构化数据提取

Uses AI to extract specific data fields from pages.
Use Cases:
  • Extract product prices and names
  • Parse contact information
  • Build structured datasets
  • Custom data schemas
Key Options:
  • schema
    : Zod or JSON schema defining desired structure
  • systemPrompt
    : guide AI extraction behavior

使用AI从页面中提取特定数据字段。
适用场景:
  • 提取产品价格和名称
  • 解析联系信息
  • 构建结构化数据集
  • 自定义数据Schema
关键参数选项:
  • schema
    : 定义所需结构的Zod或JSON Schema
  • systemPrompt
    : 引导AI提取行为的系统提示词

Authentication

身份验证

Firecrawl requires an API key for all requests.
所有Firecrawl请求都需要API密钥。

Get API Key

获取API密钥

  1. Sign up at https://www.firecrawl.dev
  2. Go to dashboard → API Keys
  3. Copy your API key (starts with
    fc-
    )
  1. https://www.firecrawl.dev 注册账号
  2. 进入控制台 → API Keys
  3. 复制你的API密钥(以
    fc-
    开头)

Store Securely

安全存储

NEVER hardcode API keys in code!
bash
undefined
绝对不要在代码中硬编码API密钥!
bash
undefined

.env file

.env 文件

FIRECRAWL_API_KEY=fc-your-api-key-here

```bash
FIRECRAWL_API_KEY=fc-your-api-key-here

```bash

.env.local (for local development)

.env.local(本地开发用)

FIRECRAWL_API_KEY=fc-your-api-key-here

---
FIRECRAWL_API_KEY=fc-your-api-key-here

---

Python SDK Usage

Python SDK 使用指南

Installation

安装

bash
pip install firecrawl-py
Latest Version:
firecrawl-py v4.5.0+
bash
pip install firecrawl-py
最新版本:
firecrawl-py v4.5.0+

Basic Scrape

基础抓取示例

python
import os
from firecrawl import FirecrawlApp
python
import os
from firecrawl import FirecrawlApp

Initialize client

初始化客户端

app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))

Scrape a single page

抓取单个页面

result = app.scrape_url( url="https://example.com/article", params={ "formats": ["markdown", "html"], "onlyMainContent": True } )
result = app.scrape_url( url="https://example.com/article", params={ "formats": ["markdown", "html"], "onlyMainContent": True } )

Access markdown content

获取Markdown内容

markdown = result.get("markdown") print(markdown)
undefined
markdown = result.get("markdown") print(markdown)
undefined

Crawl Multiple Pages

多页面爬取示例

python
import os
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
python
import os
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))

Start crawl

开始爬取

crawl_result = app.crawl_url( url="https://docs.example.com", params={ "limit": 100, "scrapeOptions": { "formats": ["markdown"] } }, poll_interval=5 # Check status every 5 seconds )
crawl_result = app.crawl_url( url="https://docs.example.com", params={ "limit": 100, "scrapeOptions": { "formats": ["markdown"] } }, poll_interval=5 # 每5秒检查一次状态 )

Process results

处理结果

for page in crawl_result.get("data", []): url = page.get("url") markdown = page.get("markdown") print(f"Scraped: {url}")
undefined
for page in crawl_result.get("data", []): url = page.get("url") markdown = page.get("markdown") print(f"已抓取: {url}")
undefined

Extract Structured Data

结构化数据提取示例

python
import os
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
python
import os
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))

Define schema

定义Schema

schema = { "type": "object", "properties": { "company_name": {"type": "string"}, "product_price": {"type": "number"}, "availability": {"type": "string"} }, "required": ["company_name", "product_price"] }
schema = { "type": "object", "properties": { "company_name": {"type": "string"}, "product_price": {"type": "number"}, "availability": {"type": "string"} }, "required": ["company_name", "product_price"] }

Extract data

提取数据

result = app.extract( urls=["https://example.com/product"], params={ "schema": schema, "systemPrompt": "Extract product information from the page" } )
print(result)

---
result = app.extract( urls=["https://example.com/product"], params={ "schema": schema, "systemPrompt": "从页面中提取产品信息" } )
print(result)

---

TypeScript/Node.js SDK Usage

TypeScript/Node.js SDK 使用指南

Installation

安装

bash
npm install @mendable/firecrawl-js
bash
npm install @mendable/firecrawl-js

or

pnpm add @mendable/firecrawl-js
pnpm add @mendable/firecrawl-js

or use the unscoped package:

或使用非作用域包:

npm install firecrawl

**Latest Version**: `@mendable/firecrawl-js v4.4.1+` (or `firecrawl v4.4.1+`)
npm install firecrawl

**最新版本**: `@mendable/firecrawl-js v4.4.1+`(或 `firecrawl v4.4.1+`)

Basic Scrape

基础抓取示例

typescript
import FirecrawlApp from '@mendable/firecrawl-js';

// Initialize client
const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

// Scrape a single page
const result = await app.scrapeUrl('https://example.com/article', {
  formats: ['markdown', 'html'],
  onlyMainContent: true
});

// Access markdown content
const markdown = result.markdown;
console.log(markdown);
typescript
import FirecrawlApp from '@mendable/firecrawl-js';

// 初始化客户端
const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

// 抓取单个页面
const result = await app.scrapeUrl('https://example.com/article', {
  formats: ['markdown', 'html'],
  onlyMainContent: true
});

// 获取Markdown内容
const markdown = result.markdown;
console.log(markdown);

Crawl Multiple Pages

多页面爬取示例

typescript
import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

// Start crawl
const crawlResult = await app.crawlUrl('https://docs.example.com', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown']
  }
});

// Process results
for (const page of crawlResult.data) {
  console.log(`Scraped: ${page.url}`);
  console.log(page.markdown);
}
typescript
import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

// 开始爬取
const crawlResult = await app.crawlUrl('https://docs.example.com', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown']
  }
});

// 处理结果
for (const page of crawlResult.data) {
  console.log(`已抓取: ${page.url}`);
  console.log(page.markdown);
}

Extract Structured Data with Zod

使用Zod提取结构化数据

typescript
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';

const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

// Define schema with Zod
const schema = z.object({
  company_name: z.string(),
  product_price: z.number(),
  availability: z.string()
});

// Extract data
const result = await app.extract({
  urls: ['https://example.com/product'],
  schema: schema,
  systemPrompt: 'Extract product information from the page'
});

console.log(result);

typescript
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';

const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

// 使用Zod定义Schema
const schema = z.object({
  company_name: z.string(),
  product_price: z.number(),
  availability: z.string()
});

// 提取数据
const result = await app.extract({
  urls: ['https://example.com/product'],
  schema: schema,
  systemPrompt: '从页面中提取产品信息'
});

console.log(result);

Common Use Cases

常见使用场景

1. Documentation Scraping

1. 文档站点抓取

Scenario: Convert entire documentation site to markdown for RAG/chatbot
python
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))

docs = app.crawl_url(
    url="https://docs.myapi.com",
    params={
        "limit": 500,
        "scrapeOptions": {
            "formats": ["markdown"],
            "onlyMainContent": True
        },
        "allowedDomains": ["docs.myapi.com"]
    }
)
场景: 将整个文档站点转换为Markdown,用于构建RAG/聊天机器人
python
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))

docs = app.crawl_url(
    url="https://docs.myapi.com",
    params={
        "limit": 500,
        "scrapeOptions": {
            "formats": ["markdown"],
            "onlyMainContent": True
        },
        "allowedDomains": ["docs.myapi.com"]
    }
)

Save to files

保存到文件

for page in docs.get("data", []): filename = page["url"].replace("https://", "").replace("/", "_") + ".md" with open(f"docs/{filename}", "w") as f: f.write(page["markdown"])
undefined
for page in docs.get("data", []): filename = page["url"].replace("https://", "").replace("/", "_") + ".md" with open(f"docs/{filename}", "w") as f: f.write(page["markdown"])
undefined

2. Product Data Extraction

2. 产品数据提取

Scenario: Extract structured product data for e-commerce
typescript
const schema = z.object({
  title: z.string(),
  price: z.number(),
  description: z.string(),
  images: z.array(z.string()),
  in_stock: z.boolean()
});

const products = await app.extract({
  urls: productUrls,
  schema: schema,
  systemPrompt: 'Extract all product details including price and availability'
});
场景: 为电商平台提取结构化产品数据
typescript
const schema = z.object({
  title: z.string(),
  price: z.number(),
  description: z.string(),
  images: z.array(z.string()),
  in_stock: z.boolean()
});

const products = await app.extract({
  urls: productUrls,
  schema: schema,
  systemPrompt: '提取所有产品详情,包括价格和库存状态'
});

3. News Article Scraping

3. 新闻文章抓取

Scenario: Extract clean article content without ads/navigation
python
article = app.scrape_url(
    url="https://news.com/article",
    params={
        "formats": ["markdown"],
        "onlyMainContent": True,
        "removeBase64Images": True
    }
)
场景: 提取干净的文章内容,去除广告和导航
python
article = app.scrape_url(
    url="https://news.com/article",
    params={
        "formats": ["markdown"],
        "onlyMainContent": True,
        "removeBase64Images": True
    }
)

Get clean markdown

获取干净的Markdown内容

content = article.get("markdown")

---
content = article.get("markdown")

---

Error Handling

错误处理

Python

Python

python
from firecrawl import FirecrawlApp
from firecrawl.exceptions import FirecrawlException

app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))

try:
    result = app.scrape_url("https://example.com")
except FirecrawlException as e:
    print(f"Firecrawl error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
python
from firecrawl import FirecrawlApp
from firecrawl.exceptions import FirecrawlException

app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))

try:
    result = app.scrape_url("https://example.com")
except FirecrawlException as e:
    print(f"Firecrawl错误: {e}")
except Exception as e:
    print(f"意外错误: {e}")

TypeScript

TypeScript

typescript
import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

try {
  const result = await app.scrapeUrl('https://example.com');
} catch (error) {
  if (error.response) {
    // API error
    console.error('API Error:', error.response.data);
  } else {
    // Network or other error
    console.error('Error:', error.message);
  }
}

typescript
import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY
});

try {
  const result = await app.scrapeUrl('https://example.com');
} catch (error) {
  if (error.response) {
    // API错误
    console.error('API错误:', error.response.data);
  } else {
    // 网络或其他错误
    console.error('错误:', error.message);
  }
}

Rate Limits & Best Practices

速率限制与最佳实践

Rate Limits

速率限制

  • Free tier: 500 credits/month
  • Paid tiers: Higher limits based on plan
  • Credits consumed vary by endpoint and options
  • 免费版: 每月500积分
  • 付费版: 根据套餐提供更高限额
  • 积分消耗根据端点和参数选项有所不同

Best Practices

最佳实践

  1. Use
    onlyMainContent: true
    to reduce credits and get cleaner data
  2. Set reasonable limits on crawls to avoid excessive costs
  3. Handle retries with exponential backoff for transient errors
  4. Cache results locally to avoid re-scraping same content
  5. Use
    map
    endpoint first
    to plan crawling strategy
  6. Batch extract calls when processing multiple URLs
  7. Monitor credit usage in dashboard

  1. 启用
    onlyMainContent: true
    以减少积分消耗并获取更干净的数据
  2. 为爬取设置合理的限制 以避免过高成本
  3. 处理重试 对临时错误使用指数退避策略
  4. 本地缓存结果 避免重复抓取相同内容
  5. 先使用
    map
    端点
    规划爬取策略
  6. 批量提取调用 处理多个URL时
  7. 在控制台监控积分使用情况

Cloudflare Workers Integration

Cloudflare Workers 集成

⚠️ Important: SDK Compatibility

⚠️ 重要提示:SDK兼容性

The Firecrawl SDK cannot run in Cloudflare Workers due to Node.js dependencies (specifically
axios
which uses Node.js
http
module). Workers require Web Standard APIs.
✅ Use the direct REST API with
fetch
instead
(see example below).
Alternative: Self-host with workers-firecrawl - a Workers-native implementation (requires Workers Paid Plan, only implements
/search
endpoint).

Firecrawl SDK无法在Cloudflare Workers中运行,因为它依赖Node.js模块(特别是使用Node.js
http
模块的
axios
)。Workers要求使用Web标准API。
✅ 替代方案:直接使用REST API结合
fetch
(见下方示例)。
另一种选择: 使用workers-firecrawl自托管 - 一个Workers原生实现(需要Workers付费套餐,仅支持
/search
端点)。

Workers Example: Direct REST API

Workers示例:直接调用REST API

This example uses the
fetch
API to call Firecrawl directly - works perfectly in Cloudflare Workers:
typescript
interface Env {
  FIRECRAWL_API_KEY: string;
  SCRAPED_CACHE?: KVNamespace; // Optional: for caching results
}

interface FirecrawlScrapeResponse {
  success: boolean;
  data: {
    markdown?: string;
    html?: string;
    metadata: {
      title?: string;
      description?: string;
      language?: string;
      sourceURL: string;
    };
  };
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    if (request.method !== 'POST') {
      return Response.json({ error: 'Method not allowed' }, { status: 405 });
    }

    try {
      const { url } = await request.json<{ url: string }>();

      if (!url) {
        return Response.json({ error: 'URL is required' }, { status: 400 });
      }

      // Check cache (optional)
      if (env.SCRAPED_CACHE) {
        const cached = await env.SCRAPED_CACHE.get(url, 'json');
        if (cached) {
          return Response.json({ cached: true, data: cached });
        }
      }

      // Call Firecrawl API directly using fetch
      const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          url: url,
          formats: ['markdown'],
          onlyMainContent: true,
          removeBase64Images: true
        })
      });

      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`Firecrawl API error (${response.status}): ${errorText}`);
      }

      const result = await response.json<FirecrawlScrapeResponse>();

      // Cache for 1 hour (optional)
      if (env.SCRAPED_CACHE && result.success) {
        await env.SCRAPED_CACHE.put(
          url,
          JSON.stringify(result.data),
          { expirationTtl: 3600 }
        );
      }

      return Response.json({
        cached: false,
        data: result.data
      });

    } catch (error) {
      console.error('Scraping error:', error);
      return Response.json(
        { error: error instanceof Error ? error.message : 'Unknown error' },
        { status: 500 }
      );
    }
  }
};
Environment Setup: Add
FIRECRAWL_API_KEY
in Wrangler secrets:
bash
npx wrangler secret put FIRECRAWL_API_KEY
Optional KV Binding (for caching - add to
wrangler.jsonc
):
jsonc
{
  "kv_namespaces": [
    {
      "binding": "SCRAPED_CACHE",
      "id": "your-kv-namespace-id"
    }
  ]
}
See
templates/firecrawl-worker-fetch.ts
for a complete production-ready example.

以下示例使用
fetch
API直接调用Firecrawl - 可在Cloudflare Workers中完美运行:
typescript
interface Env {
  FIRECRAWL_API_KEY: string;
  SCRAPED_CACHE?: KVNamespace; // 可选:用于缓存结果
}

interface FirecrawlScrapeResponse {
  success: boolean;
  data: {
    markdown?: string;
    html?: string;
    metadata: {
      title?: string;
      description?: string;
      language?: string;
      sourceURL: string;
    };
  };
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    if (request.method !== 'POST') {
      return Response.json({ error: '方法不允许' }, { status: 405 });
    }

    try {
      const { url } = await request.json<{ url: string }>();

      if (!url) {
        return Response.json({ error: 'URL为必填项' }, { status: 400 });
      }

      // 检查缓存(可选)
      if (env.SCRAPED_CACHE) {
        const cached = await env.SCRAPED_CACHE.get(url, 'json');
        if (cached) {
          return Response.json({ cached: true, data: cached });
        }
      }

      // 使用fetch直接调用Firecrawl API
      const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          url: url,
          formats: ['markdown'],
          onlyMainContent: true,
          removeBase64Images: true
        })
      });

      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`Firecrawl API错误 (${response.status}): ${errorText}`);
      }

      const result = await response.json<FirecrawlScrapeResponse>();

      // 缓存1小时(可选)
      if (env.SCRAPED_CACHE && result.success) {
        await env.SCRAPED_CACHE.put(
          url,
          JSON.stringify(result.data),
          { expirationTtl: 3600 }
        );
      }

      return Response.json({
        cached: false,
        data: result.data
      });

    } catch (error) {
      console.error('抓取错误:', error);
      return Response.json(
        { error: error instanceof Error ? error.message : '未知错误' },
        { status: 500 }
      );
    }
  }
};
环境配置: 在Wrangler密钥中添加
FIRECRAWL_API_KEY
:
bash
npx wrangler secret put FIRECRAWL_API_KEY
可选KV绑定(用于缓存 - 添加到
wrangler.jsonc
):
jsonc
{
  "kv_namespaces": [
    {
      "binding": "SCRAPED_CACHE",
      "id": "your-kv-namespace-id"
    }
  ]
}
完整的生产就绪示例请查看
templates/firecrawl-worker-fetch.ts

When to Use This Skill

何时使用该工具

Use Firecrawl when:
  • Scraping modern websites with JavaScript
  • Need clean markdown output for LLMs
  • Building RAG systems from web content
  • Extracting structured data at scale
  • Dealing with bot protection
  • Need reliable, production-ready scraping
Don't use Firecrawl when:
  • Scraping simple static HTML (use cheerio/beautifulsoup)
  • Have existing Puppeteer/Playwright setup working well
  • Working with APIs (use direct API calls instead)
  • Budget constraints (free tier has limits)

推荐使用Firecrawl的场景:
  • 抓取包含JavaScript的现代网站
  • 需要为LLM提供干净的Markdown输出
  • 从网页内容构建RAG系统
  • 大规模提取结构化数据
  • 应对反机器人防护
  • 需要可靠的、可投入生产的抓取方案
不推荐使用Firecrawl的场景:
  • 抓取简单的静态HTML(使用cheerio/beautifulsoup即可)
  • 已有稳定运行的Puppeteer/Playwright环境
  • 直接调用API即可获取数据(无需抓取)
  • 预算有限(免费版有额度限制)

Common Issues & Solutions

常见问题与解决方案

Issue: "Invalid API Key"

问题:"Invalid API Key"

Cause: API key not set or incorrect Fix:
bash
undefined
原因: API密钥未设置或不正确 解决方法:
bash
undefined

Check env variable is set

检查环境变量是否已设置

echo $FIRECRAWL_API_KEY
echo $FIRECRAWL_API_KEY

Verify key format (should start with fc-)

验证密钥格式(应以fc-开头)

undefined
undefined

Issue: "Rate limit exceeded"

问题:"Rate limit exceeded"

Cause: Exceeded monthly credits Fix:
  • Check usage in dashboard
  • Upgrade plan or wait for reset
  • Use
    onlyMainContent: true
    to reduce credits
原因: 超出每月积分限额 解决方法:
  • 在控制台查看使用情况
  • 升级套餐或等待额度重置
  • 启用
    onlyMainContent: true
    以减少积分消耗

Issue: "Timeout error"

问题:"Timeout error"

Cause: Page takes too long to load Fix:
python
result = app.scrape_url(url, params={"waitFor": 10000})  # Wait 10s
原因: 页面加载时间过长 解决方法:
python
result = app.scrape_url(url, params={"waitFor": 10000})  # 等待10秒

Issue: "Content is empty"

问题:"Content is empty"

Cause: Content loaded via JavaScript after initial render Fix:
python
result = app.scrape_url(url, params={
    "waitFor": 5000,
    "actions": [{"type": "wait", "milliseconds": 3000}]
})

原因: 内容在初始渲染后通过JavaScript加载 解决方法:
python
result = app.scrape_url(url, params={
    "waitFor": 5000,
    "actions": [{"type": "wait", "milliseconds": 3000}]
})

Advanced Features

高级功能

Browser Actions

浏览器操作

Perform interactions before scraping:
python
result = app.scrape_url(
    url="https://example.com",
    params={
        "actions": [
            {"type": "click", "selector": "button.load-more"},
            {"type": "wait", "milliseconds": 2000},
            {"type": "scroll", "direction": "down"}
        ]
    }
)
在抓取前执行交互操作:
python
result = app.scrape_url(
    url="https://example.com",
    params={
        "actions": [
            {"type": "click", "selector": "button.load-more"},
            {"type": "wait", "milliseconds": 2000},
            {"type": "scroll", "direction": "down"}
        ]
    }
)

Custom Headers

自定义请求头

python
result = app.scrape_url(
    url="https://example.com",
    params={
        "headers": {
            "User-Agent": "Custom Bot 1.0",
            "Accept-Language": "en-US"
        }
    }
)
python
result = app.scrape_url(
    url="https://example.com",
    params={
        "headers": {
            "User-Agent": "Custom Bot 1.0",
            "Accept-Language": "en-US"
        }
    }
)

Webhooks for Long Crawls

长时爬取的Webhook

Instead of polling, receive results via webhook:
python
crawl = app.crawl_url(
    url="https://docs.example.com",
    params={
        "limit": 1000,
        "webhook": "https://your-domain.com/webhook"
    }
)

替代轮询,通过Webhook接收结果:
python
crawl = app.crawl_url(
    url="https://docs.example.com",
    params={
        "limit": 1000,
        "webhook": "https://your-domain.com/webhook"
    }
)

Package Versions

包版本

PackageVersionLast Checked
firecrawl-py4.5.0+2025-10-20
@mendable/firecrawl-js (or firecrawl)4.4.1+2025-10-24
API Versionv2Current
Note: The Node.js SDK requires Node.js >=22.0.0 and cannot run in Cloudflare Workers. Use direct REST API calls in Workers (see Cloudflare Workers Integration section).

版本最后检查日期
firecrawl-py4.5.0+2025-10-20
@mendable/firecrawl-js (或 firecrawl)4.4.1+2025-10-24
API版本v2当前
注意: Node.js SDK需要Node.js >=22.0.0,且无法在Cloudflare Workers中运行。在Workers中请使用直接REST API调用(见Cloudflare Workers集成部分)。

Official Documentation

官方文档

Next Steps After Using This Skill

使用该工具后的后续步骤

  1. Store scraped data: Use Cloudflare D1, R2, or KV to persist results
  2. Build RAG system: Combine with Vectorize for semantic search
  3. Add scheduling: Use Cloudflare Queues for recurring scrapes
  4. Process content: Use Workers AI to analyze scraped data

Token Savings: ~60% vs manual integration Error Prevention: API authentication, rate limiting, format handling Production Ready: ✅
  1. 存储抓取数据: 使用Cloudflare D1、R2或KV持久化结果
  2. 构建RAG系统: 结合Vectorize实现语义搜索
  3. 添加调度: 使用Cloudflare Queues实现定期抓取
  4. 处理内容: 使用Workers AI分析抓取的数据

Token节省: 相比手动集成节省约60% 错误预防: API身份验证、速率限制、格式处理 生产就绪: ✅