firecrawl-scrape

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

firecrawl scrape

firecrawl scrape

Scrape one or more URLs. Returns clean, LLM-optimized markdown. Multiple URLs are scraped concurrently.
抓取一个或多个URL,返回干净、针对LLM优化的Markdown内容。多URL会被并发抓取。

When to use

使用场景

  • You have a specific URL and want its content
  • The page is static or JS-rendered (SPA)
  • Step 2 in the workflow escalation pattern: search → scrape → map → crawl → browser
  • 你有具体的URL并想要获取其内容
  • 目标页面是静态页面或JS渲染的SPA
  • 属于工作流升级模式的第二步:搜索 → 抓取 → 映射 → 爬取 → 浏览器操作

Quick start

快速开始

bash
undefined
bash
undefined

Basic markdown extraction

Basic markdown extraction

firecrawl scrape "<url>" -o .firecrawl/page.md
firecrawl scrape "<url>" -o .firecrawl/page.md

Main content only, no nav/footer

Main content only, no nav/footer

firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md

Wait for JS to render, then scrape

Wait for JS to render, then scrape

firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md

Multiple URLs (each saved to .firecrawl/)

Multiple URLs (each saved to .firecrawl/)

Get markdown and links together

Get markdown and links together

firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json
undefined
firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json
undefined

Options

选项说明

OptionDescription
-f, --format <formats>
Output formats: markdown, html, rawHtml, links, screenshot, json
-H
Include HTTP headers in output
--only-main-content
Strip nav, footer, sidebar — main content only
--wait-for <ms>
Wait for JS rendering before scraping
--include-tags <tags>
Only include these HTML tags
--exclude-tags <tags>
Exclude these HTML tags
-o, --output <path>
Output file path
选项描述
-f, --format <formats>
输出格式:markdown, html, rawHtml, links, screenshot, json
-H
在输出中包含HTTP头部信息
--only-main-content
移除导航栏、页脚、侧边栏 — 仅保留主内容
--wait-for <ms>
等待JS渲染完成后再执行抓取
--include-tags <tags>
仅包含指定的HTML标签
--exclude-tags <tags>
排除指定的HTML标签
-o, --output <path>
输出文件路径

Tips

使用技巧

  • Try scrape before browser. Scrape handles static pages and JS-rendered SPAs. Only escalate to browser when you need interaction (clicks, form fills, pagination).
  • Multiple URLs are scraped concurrently — check
    firecrawl --status
    for your concurrency limit.
  • Single format outputs raw content. Multiple formats (e.g.,
    --format markdown,links
    ) output JSON.
  • Always quote URLs — shell interprets
    ?
    and
    &
    as special characters.
  • Naming convention:
    .firecrawl/{site}-{path}.md
  • 优先使用scrape而非browser:scrape可处理静态页面和JS渲染的SPA。仅当需要交互操作(点击、表单填写、分页)时,再升级使用browser。
  • 多URL会被并发抓取 — 可通过
    firecrawl --status
    查看你的并发限制。
  • 单一格式输出原始内容,多格式(如
    --format markdown,links
    )则输出JSON格式。
  • 始终给URL加引号 — 终端会将
    ?
    &
    视为特殊字符。
  • 命名规范:
    .firecrawl/{site}-{path}.md

See also

相关工具

  • firecrawl-search — find pages when you don't have a URL
  • firecrawl-browser — when scrape can't get the content (interaction needed)
  • firecrawl-download — bulk download an entire site to local files
  • firecrawl-search — 当你没有具体URL时,用于查找页面
  • firecrawl-browser — 当scrape无法获取内容时使用(需要交互操作)
  • firecrawl-download — 批量下载整个站点到本地文件