crawl4ai

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

crawl4ai

High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.

一款具备智能分块功能的高性能网页爬虫。基于LLM的骨架规划技术，可爬取网页并将内容提取为Markdown格式。

Commands

命令

crawl_url

(alias:

webCrawl

)

crawl_url

（别名：

webCrawl

）

Crawl a web page with LangGraph workflow and LLM-based intelligent chunking.

Parameters:

Parameter	Type	Default	Description
`url`	str	-	Target URL to crawl (required)
`action`	str	"smart"	Action mode: "smart", "skeleton", "crawl"
`fit_markdown`	bool	true	Clean and simplify markdown output
`max_depth`	int	0	Maximum crawling depth (0=single page)
`return_skeleton`	bool	false	Also return document skeleton (TOC)
`chunk_indices`	list[int]	-	List of section indices to extract

Action Modes:

Mode	Description	Use Case
`smart` (default)	LLM generates chunk plan, then extracts relevant sections	Large docs where you need specific info
`skeleton`	Extract lightweight TOC without full content	Quick overview, decide what to read
`crawl`	Return full markdown content	Small pages, complete content needed

Examples:

python

undefined

通过LangGraph工作流和基于LLM的智能分块功能爬取网页。

参数：

参数	类型	默认值	描述
`url`	str	-	目标URL（必填）
`action`	str	"smart"	操作模式："smart"、"skeleton"、"crawl"
`fit_markdown`	bool	true	清理并简化Markdown输出内容
`max_depth`	int	0	最大爬取深度（0=仅单页）
`return_skeleton`	bool	false	同时返回文档骨架（目录）
`chunk_indices`	list[int]	-	需要提取的章节索引列表

操作模式说明：

模式	描述	使用场景
`smart` （默认）	LLM生成分块规划，随后提取相关章节	需要从大型文档中获取特定信息的场景
`skeleton`	仅提取轻量级目录，不包含完整内容	快速概览文档结构，决定阅读重点
`crawl`	返回完整的Markdown内容	小型页面，需要获取完整内容的场景

示例：

python

undefined

Smart crawl with LLM chunking (default)

@omni("crawl4ai.CrawlUrl", {"url": "https://example.com"})

Skeleton only - get TOC quickly

@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "skeleton"})

Full content crawl

@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "crawl"})

Extract specific sections

@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "chunk_indices": [0, 1, 2]})

Deep crawl (follow links up to depth N)

@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "max_depth": 2})

Get skeleton with full content

@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "return_skeleton": true})

undefined

@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "return_skeleton": true})

undefined

Core Concepts

核心概念

Topic	Description	Reference
Skeleton Planning	LLM sees TOC (~500 tokens) not full content (~10k+)	smart-chunking.md
Chunk Extraction	Token-aware section extraction	chunking.md
Deep Crawling	Multi-page crawling with BFS strategy	deep-crawl.md

主题	描述	参考文档
骨架规划（Skeleton Planning）	LLM仅读取目录（约500 tokens）而非完整内容（约10k+ tokens）	smart-chunking.md
分块提取（Chunk Extraction）	基于Token感知的章节提取	chunking.md
深度爬取（Deep Crawling）	采用BFS策略的多页面爬取	deep-crawl.md

Best Practices

最佳实践

Use
```
skeleton
```
mode first for large documents to understand structure
Use
```
chunk_indices
```
to extract specific sections instead of full content
Set
```
max_depth
```
> 0 carefully - limits pages crawled to prevent runaway crawling
Keep
```
fit_markdown=true
```
for cleaner output, false for raw content

对于大型文档，先使用
```
skeleton
```
模式了解其结构
使用
```
chunk_indices
```
提取特定章节，而非获取完整内容
谨慎设置
```
max_depth
```
> 0 - 限制爬取页面数量，避免无限制爬取
保持
```
fit_markdown=true
```
以获得更整洁的输出，设置为false可获取原始内容

Advanced

进阶用法

Batch multiple URLs with separate calls
Combine with knowledge tools for RAG pipelines
Use skeleton + LLM to auto-generate chunk plans for custom extraction

通过单独调用批量处理多个URL
与知识工具结合，构建RAG流水线
结合骨架规划与LLM自动生成自定义提取的分块方案

crawl4ai

Original

Translation

crawl4ai

crawl4ai

Commands

命令

`crawl_url`
(alias:
`webCrawl`
)

`crawl_url`
（别名：
`webCrawl`
）

Smart crawl with LLM chunking (default)

Smart crawl with LLM chunking (default)

Skeleton only - get TOC quickly

Skeleton only - get TOC quickly

Full content crawl

Full content crawl

Extract specific sections

Extract specific sections

Deep crawl (follow links up to depth N)

Deep crawl (follow links up to depth N)

Get skeleton with full content

Get skeleton with full content

Core Concepts

核心概念

Best Practices

最佳实践

Advanced

进阶用法

crawl4ai

Original

Translation

crawl4ai

crawl4ai

Commands

命令

crawl_url (alias: webCrawl)

crawl_url（别名：webCrawl）

Smart crawl with LLM chunking (default)

Smart crawl with LLM chunking (default)

Skeleton only - get TOC quickly

Skeleton only - get TOC quickly

Full content crawl

Full content crawl

Extract specific sections

Extract specific sections

Deep crawl (follow links up to depth N)

Deep crawl (follow links up to depth N)

Get skeleton with full content

Get skeleton with full content

Core Concepts

核心概念

Best Practices

最佳实践

Advanced

进阶用法

`crawl_url`
(alias:
`webCrawl`
)

`crawl_url`
（别名：
`webCrawl`
）