firecrawl

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Firecrawl Web Scraping & Data Extraction

Firecrawl 网页抓取与数据提取

Installation

安装

bash
pip install firecrawl-py
bash
pip install firecrawl-py

Environment Setup

环境配置

Set your Firecrawl API key:
bash
export FIRECRAWL_API_KEY="your-api-key-here"
设置你的Firecrawl API密钥:
bash
export FIRECRAWL_API_KEY="your-api-key-here"

Scripts

脚本说明

Note: Set
SKILL_ROOT
to this skill's base directory. Reference bundled scripts as
python3 "$SKILL_ROOT/scripts/<script>.py" ...
(not relative paths from the current working directory).
注意:将
SKILL_ROOT
设置为该工具的根目录。调用内置脚本时请使用
python3 "$SKILL_ROOT/scripts/<script>.py" ...
格式(不要使用当前工作目录的相对路径)。

scrape.py - Single Page Scraping

scrape.py - 单页抓取

The most powerful and reliable scraper. Use when you know exactly which page contains the information.
bash
undefined
功能最强大、最稳定的抓取工具,当你明确知道信息所在的具体页面时使用。
bash
undefined

Basic scrape (returns markdown)

基础抓取(返回markdown格式内容)

python3 "$SKILL_ROOT/scripts/scrape.py" "https://example.com"
python3 "$SKILL_ROOT/scripts/scrape.py" "https://example.com"

Get HTML format

获取HTML格式内容

python3 "$SKILL_ROOT/scripts/scrape.py" "https://example.com" --format html
python3 "$SKILL_ROOT/scripts/scrape.py" "https://example.com" --format html

Extract only main content (removes headers, footers, etc.)

仅提取主体内容(移除页眉、页脚等冗余内容)

python3 "$SKILL_ROOT/scripts/scrape.py" "https://example.com" --only-main
python3 "$SKILL_ROOT/scripts/scrape.py" "https://example.com" --only-main

Combine options

组合使用参数

python3 "$SKILL_ROOT/scripts/scrape.py" "https://docs.example.com/api" --format markdown --only-main
undefined
python3 "$SKILL_ROOT/scripts/scrape.py" "https://docs.example.com/api" --format markdown --only-main
undefined

search.py - Web Search

search.py - 网页搜索

Search the web when you don't know which website has the information.
bash
undefined
当你不知道信息存在于哪个网站时使用,可直接搜索全网内容。
bash
undefined

Basic search

基础搜索

python3 "$SKILL_ROOT/scripts/search.py" "latest AI research papers 2024"
python3 "$SKILL_ROOT/scripts/search.py" "latest AI research papers 2024"

Limit results

限制返回结果数量

python3 "$SKILL_ROOT/scripts/search.py" "Python web scraping tutorials" --limit 5
python3 "$SKILL_ROOT/scripts/search.py" "Python web scraping tutorials" --limit 5

Search with scraping (get full content)

搜索同时抓取内容(获取完整页面内容)

python3 "$SKILL_ROOT/scripts/search.py" "firecrawl documentation" --limit 3
undefined
python3 "$SKILL_ROOT/scripts/search.py" "firecrawl documentation" --limit 3
undefined

map.py - URL Discovery

map.py - URL发现

Discover all URLs on a website. Use before deciding what to scrape.
bash
undefined
发现指定网站下的所有URL,可在确定抓取目标前使用。
bash
undefined

Map a website

映射网站所有URL

python3 "$SKILL_ROOT/scripts/map.py" "https://docs.example.com"
python3 "$SKILL_ROOT/scripts/map.py" "https://docs.example.com"

Limit number of URLs

限制返回URL数量

python3 "$SKILL_ROOT/scripts/map.py" "https://example.com" --limit 100
python3 "$SKILL_ROOT/scripts/map.py" "https://example.com" --limit 100

Search within mapped URLs

在映射的URL中搜索指定内容

python3 "$SKILL_ROOT/scripts/map.py" "https://docs.example.com" --search "authentication"
undefined
python3 "$SKILL_ROOT/scripts/map.py" "https://docs.example.com" --search "authentication"
undefined

crawl.py - Multi-Page Crawling

crawl.py - 多页爬取

Extract content from multiple related pages. Warning: can be slow and return large results.
bash
undefined
从多个相关页面提取内容。注意:该操作可能较慢,且返回结果体量较大。
bash
undefined

Basic crawl

基础爬取

python3 "$SKILL_ROOT/scripts/crawl.py" "https://docs.example.com"
python3 "$SKILL_ROOT/scripts/crawl.py" "https://docs.example.com"

Limit pages

限制爬取页面数量

python3 "$SKILL_ROOT/scripts/crawl.py" "https://docs.example.com" --limit 20
python3 "$SKILL_ROOT/scripts/crawl.py" "https://docs.example.com" --limit 20

Control crawl depth

控制爬取深度

python3 "$SKILL_ROOT/scripts/crawl.py" "https://docs.example.com" --limit 10 --depth 2
undefined
python3 "$SKILL_ROOT/scripts/crawl.py" "https://docs.example.com" --limit 10 --depth 2
undefined

extract.py - Structured Data Extraction

extract.py - 结构化数据提取

Extract specific structured data using LLM capabilities.
bash
undefined
利用LLM能力提取指定的结构化数据。
bash
undefined

Extract with prompt

通过提示词提取内容

python3 "$SKILL_ROOT/scripts/extract.py" "https://example.com/pricing"
--prompt "Extract all pricing tiers with their features and prices"
python3 "$SKILL_ROOT/scripts/extract.py" "https://example.com/pricing"
--prompt "Extract all pricing tiers with their features and prices"

Extract with JSON schema

通过JSON schema提取内容

python3 "$SKILL_ROOT/scripts/extract.py" "https://example.com/team"
--prompt "Extract team member information"
--schema '{"type":"object","properties":{"members":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"role":{"type":"string"},"bio":{"type":"string"}}}}}}'
python3 "$SKILL_ROOT/scripts/extract.py" "https://example.com/team"
--prompt "Extract team member information"
--schema '{"type":"object","properties":{"members":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"role":{"type":"string"},"bio":{"type":"string"}}}}}}'

Extract from multiple URLs

从多个URL提取内容

python3 "$SKILL_ROOT/scripts/extract.py" "https://example.com/page1" "https://example.com/page2"
--prompt "Extract product information"
undefined
python3 "$SKILL_ROOT/scripts/extract.py" "https://example.com/page1" "https://example.com/page2"
--prompt "Extract product information"
undefined

agent.py - Autonomous Data Gathering

agent.py - 自主数据收集

Autonomous agent that searches, navigates, and extracts data from anywhere on the web.
bash
undefined
自主Agent,可从网络任意位置搜索、导航并提取数据。
bash
undefined

Simple research task

简单调研任务

python3 "$SKILL_ROOT/scripts/agent.py" --prompt "Find the founders of Firecrawl and their backgrounds"
python3 "$SKILL_ROOT/scripts/agent.py" --prompt "Find the founders of Firecrawl and their backgrounds"

Complex data gathering

复杂数据收集任务

python3 "$SKILL_ROOT/scripts/agent.py" --prompt "Find the top 5 AI startups founded in 2024 and their funding amounts"
python3 "$SKILL_ROOT/scripts/agent.py" --prompt "Find the top 5 AI startups founded in 2024 and their funding amounts"

Focus on specific URLs

限定在指定URL范围内操作

python3 "$SKILL_ROOT/scripts/agent.py"
--prompt "Compare the features and pricing"
--urls "https://example1.com,https://example2.com"
python3 "$SKILL_ROOT/scripts/agent.py"
--prompt "Compare the features and pricing"
--urls "https://example1.com,https://example2.com"

With output schema

指定输出schema

python3 "$SKILL_ROOT/scripts/agent.py"
--prompt "Find recent tech layoffs"
--schema '{"type":"object","properties":{"layoffs":{"type":"array","items":{"type":"object","properties":{"company":{"type":"string"},"count":{"type":"number"},"date":{"type":"string"}}}}}}'
undefined
python3 "$SKILL_ROOT/scripts/agent.py"
--prompt "Find recent tech layoffs"
--schema '{"type":"object","properties":{"layoffs":{"type":"array","items":{"type":"object","properties":{"company":{"type":"string"},"count":{"type":"number"},"date":{"type":"string"}}}}}}'
undefined

Output Format

输出格式

All scripts output JSON to stdout. Errors are written to stderr.
所有脚本都会将JSON结果输出到stdout,错误信息会写入stderr。

Success Response

成功响应

json
{
  "success": true,
  "data": { ... }
}
json
{
  "success": true,
  "data": { ... }
}

Error Response

错误响应

json
{
  "success": false,
  "error": "Error message"
}
json
{
  "success": false,
  "error": "Error message"
}

Tips

使用提示

  1. Performance: Use
    scrape
    for single pages - it's 500% faster with caching
  2. Discovery: Use
    map
    first to find URLs, then
    scrape
    specific pages
  3. Large sites: Prefer
    map
    +
    scrape
    over
    crawl
    for better control
  4. Structured data: Use
    extract
    with a JSON schema for consistent output
  5. Research: Use
    agent
    when you don't know where to find the data
  1. 性能优化: 单页内容优先使用
    scrape
    ,开启缓存后速度提升500%
  2. 资源发现: 先使用
    map
    发现URL,再针对特定页面使用
    scrape
    抓取
  3. 大型站点处理: 优先使用
    map
    +
    scrape
    组合,比直接用
    crawl
    可控性更高
  4. 结构化数据: 配合JSON schema使用
    extract
    可获得一致格式的输出
  5. 调研场景: 当你不知道数据来源时使用
    agent