firecrawl-parse

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

firecrawl parse

firecrawl parse

Turn a local document into clean markdown on disk. Supports PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, HTML/HTM/XHTML.
将本地文档转换为磁盘上的干净Markdown格式。支持PDF、DOCX、DOC、ODT、RTF、XLSX、XLS、HTML/HTM/XHTML格式。

When to use

使用场景

  • You have a file on disk (not a URL) and want its text as markdown
  • User drops a PDF/DOCX and asks what it says, or to summarize it
  • Use
    scrape
    instead when the source is a URL
  • 你有磁盘上的文件(而非URL),希望将其文本转换为Markdown格式
  • 用户上传PDF/DOCX并询问内容,或要求生成摘要
  • 当来源是URL时,请改用
    scrape

Quick start

快速开始

Always save to
.firecrawl/
with
-o
— parsed docs can be hundreds of KB and blow up context if streamed to stdout. Add
.firecrawl/
to
.gitignore
.
bash
mkdir -p .firecrawl
始终使用
-o
选项保存到
.firecrawl/
目录——解析后的文档可能达数百KB大小,如果输出到标准输出会占用大量上下文。将
.firecrawl/
添加到
.gitignore
中。
bash
mkdir -p .firecrawl

File → markdown

文件 → Markdown

firecrawl parse ./paper.pdf -o .firecrawl/paper.md
firecrawl parse ./paper.pdf -o .firecrawl/paper.md

AI summary

AI摘要

firecrawl parse ./paper.pdf -S -o .firecrawl/paper-summary.md
firecrawl parse ./paper.pdf -S -o .firecrawl/paper-summary.md

Ask a question about the doc

针对文档提问

firecrawl parse ./paper.pdf -Q "What are the main conclusions?"
-o .firecrawl/paper-qa.md

Then `head`, `grep`, `rg` etc., or incrementally read the file - don't load the whole thing at once.
firecrawl parse ./paper.pdf -Q "What are the main conclusions?"
-o .firecrawl/paper-qa.md

之后可使用`head`、`grep`、`rg`等命令,或增量读取文件——不要一次性加载整个文件。

Options

选项

OptionDescription
-S, --summary
AI-generated summary
-Q, --query <prompt>
Ask a question about the parsed content
-o, --output <path>
Output file path — always use this
-f, --format <fmt>
markdown
(default),
html
,
summary
--timeout <ms>
Timeout for the parse job
--timing
Show request duration
选项描述
-S, --summary
AI生成的摘要
-Q, --query <prompt>
针对解析后的内容提问
-o, --output <path>
输出文件路径 —— 务必使用此选项
-f, --format <fmt>
markdown
(默认)、
html
summary
--timeout <ms>
解析任务的超时时间
--timing
显示请求耗时

Tips

提示

  • Quote paths with spaces:
    firecrawl parse "./My Doc.pdf" -o .firecrawl/mydoc.md
    .
  • Max upload size: 50 MB per file.
  • Credits: ~1 per PDF page; HTML is 1 flat.
  • Check
    .firecrawl/
    before re-parsing the same file.
  • To check your credit balance (recommended for batch processing and similar workflows), use the
    firecrawl credit-usage
    command.
  • 对包含空格的路径加引号:
    firecrawl parse "./My Doc.pdf" -o .firecrawl/mydoc.md
  • 最大上传大小:单文件50 MB
  • 积分消耗:每页PDF约1积分;HTML文件固定1积分。
  • 重新解析同一文件前,请检查
    .firecrawl/
    目录。
  • 如需查看积分余额(批量处理等场景推荐),请使用
    firecrawl credit-usage
    命令。

See also

另请参阅

  • firecrawl-scrape — same idea for URLs
  • firecrawl-scrape —— 针对URL的同类工具