web-crawler

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Rust Web Crawler (rcrawler)

Rust网络爬虫（rcrawler）

High-performance web crawler built in pure Rust with production-grade features for fast, reliable site crawling.

一款基于纯Rust开发的高性能网络爬虫，具备生产级特性，可实现快速、可靠的站点爬取。

When to Use This Skill

何时使用本工具

Use this skill when the user requests:

Web crawling or site mapping
Sitemap discovery and analysis
Link extraction and validation
Site structure visualization
robots.txt compliance checking
Performance-critical web scraping
Generating interactive web reports with graph visualization

当用户提出以下需求时，可使用本工具：

网络爬取或站点测绘
站点地图发现与分析
链接提取与验证
站点结构可视化
robots.txt合规性检查
对性能要求较高的网页抓取
生成带图形可视化的交互式网页报告

Core Capabilities

核心功能

🚀 Performance

🚀 性能表现

60+ pages/sec throughput with async Tokio runtime
<50ms startup time - Near-instant initialization
~50MB memory usage - Efficient resource consumption
5.4 MB binary - Single executable, no dependencies

每秒60+页面：基于异步Tokio runtime的吞吐量
<50ms启动时间：近乎即时的初始化速度
约50MB内存占用：高效的资源消耗
5.4 MB二进制文件：单一可执行文件，无依赖项

🤖 Intelligence

🤖 智能特性

Sitemap discovery: Automatically finds and parses sitemap.xml (3 standard locations)
robots.txt compliance: Respects crawling rules with per-domain caching
Smart filtering: Auto-excludes images, CSS, JS, PDFs by default
Domain auto-detection: Extracts and restricts to base domain automatically

站点地图发现：自动查找并解析sitemap.xml（支持3个标准位置）
robots.txt合规：遵循爬取规则，支持按域名缓存
智能过滤：默认自动排除图片、CSS、JS、PDF文件
域名自动检测：自动提取并限制在基础域名范围内爬取

🔒 Safety

🔒 安全保障

Rate limiting: Token bucket algorithm (default 2 req/s)
Configurable timeout: 30 second default
Memory safe: Rust's ownership system prevents crashes
Graceful shutdown: 2-second grace period for pending requests

速率限制：采用令牌桶算法（默认每秒2次请求）
可配置超时：默认30秒
内存安全：Rust的所有权系统可防止崩溃
优雅停机：为待处理请求提供2秒宽限期

📊 Output

📊 输出功能

Multiple formats: JSON, Markdown, HTML, CSV, Links, Text
LLM-ready Markdown: Clean content with YAML frontmatter
Interactive HTML report: Dashboard with graph visualization
Stealth mode: User-agent rotation and realistic headers
Content filtering: Remove nav, ads, scripts for clean data
Real-time progress: Updates every 5 seconds during crawl

多格式支持：JSON、Markdown、HTML、CSV、链接列表、纯文本
适配LLM的Markdown：带YAML前置元数据的干净内容
交互式HTML报告：带图形可视化的仪表盘
隐身模式：User-Agent轮换与真实请求头
内容过滤：移除导航栏、广告、脚本以获取干净数据
实时进度更新：爬取过程中每5秒更新一次进度

📝 Monitoring

📝 监控功能

Structured logging: tracing with timestamps and log levels

Progress tracking:

[Progress] Pages: X/Y | Active jobs: Z | Errors: N

Detailed statistics: Pages found, crawled, external links, errors, duration

结构化日志：带时间戳和日志级别的tracing日志

进度跟踪：

[Progress] 页面数: X/Y | 活跃任务数: Z | 错误数: N

详细统计：已发现页面数、已爬取页面数、外部链接数、错误数、耗时

Installation & Setup

安装与设置

Binary Location

二进制文件位置

bash

~/.claude/skills/web-crawler/bin/rcrawler

bash

~/.claude/skills/web-crawler/bin/rcrawler

Build from Source

从源码构建

bash

undefined

bash

undefined

Clone the repository

克隆仓库

git clone https://github.com/leobrival/rcrawler.git cd rcrawler

Build release binary

构建发布版二进制文件

cargo build --release

Copy to skill directory

复制到工具目录

cp target/release/rcrawler ~/.claude/skills/web-crawler/bin/


Build time: ~2 minutes
Binary size: 5.4 MB

cp target/release/rcrawler ~/.claude/skills/web-crawler/bin/


构建时间：约2分钟
二进制文件大小：5.4 MB

Command Line Interface

命令行界面

Basic Syntax

基本语法

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> [OPTIONS]

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> [OPTIONS]

Options

选项

Core Options:

```
-w, --workers <N>
```
: Number of concurrent workers (default: 20, range: 1-50)
```
-d, --depth <N>
```
: Maximum crawl depth (default: 2)
```
-r, --rate <N>
```
: Rate limit in requests/second (default: 2.0)

Configuration:

```
-p, --profile <NAME>
```
: Use predefined profile (fast/deep/gentle)
```
--domain <DOMAIN>
```
: Restrict to specific domain (auto-detected from URL)
```
-o, --output <PATH>
```
: Custom output directory (default: ./output)

Features:

```
-s, --sitemap
```
: Enable/disable sitemap discovery (default: true)
```
--stealth
```
: Enable stealth mode with user-agent rotation
```
--markdown
```
: Convert HTML to LLM-ready Markdown with frontmatter
```
--filter-content
```
: Enable content filtering (remove nav, ads, scripts)
```
--debug
```
: Enable debug logging with detailed trace information
```
--resume
```
: Resume from checkpoint if available

Output:

```
-f, --formats <LIST>
```
: Output formats (json,markdown,html,csv,links,text)

核心选项:

```
-w, --workers <N>
```
：并发工作线程数（默认：20，范围：1-50）
```
-d, --depth <N>
```
：最大爬取深度（默认：2）
```
-r, --rate <N>
```
：速率限制（每秒请求数，默认：2.0）

配置选项:

```
-p, --profile <NAME>
```
：使用预定义配置文件（fast/deep/gentle）
```
--domain <DOMAIN>
```
：限制爬取的特定域名（默认从URL自动检测）
```
-o, --output <PATH>
```
：自定义输出目录（默认：./output）

功能选项:

```
-s, --sitemap
```
：启用/禁用站点地图发现（默认：启用）
```
--stealth
```
：启用隐身模式，轮换User-Agent
```
--markdown
```
：将HTML转换为适配LLM的Markdown（带前置元数据）
```
--filter-content
```
：启用内容过滤（移除导航栏、广告、脚本）
```
--debug
```
：启用调试日志，输出详细跟踪信息
```
--resume
```
：从检查点恢复爬取（如果可用）

输出格式选项:

```
-f, --formats <LIST>
```
：输出格式（json,markdown,html,csv,links,text）

Profiles

预定义配置文件

Fast Profile (Quick Mapping)

快速配置（快速测绘）

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> -p fast

Workers: 50
Depth: 3
Rate: 10 req/s
Use case: Quick site structure overview

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> -p fast

工作线程数：50
爬取深度：3
速率限制：10次请求/秒
适用场景：快速获取站点结构概览

Deep Profile (Comprehensive Crawl)

深度配置（全面爬取）

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> -p deep

Workers: 20
Depth: 10
Rate: 3 req/s
Use case: Complete site analysis

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> -p deep

工作线程数：20
爬取深度：10
速率限制：3次请求/秒
适用场景：完整站点分析

Gentle Profile (Server-Friendly)

温和配置（服务器友好）

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> -p gentle

Workers: 5
Depth: 5
Rate: 1 req/s
Use case: Respecting server resources

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> -p gentle

工作线程数：5
爬取深度：5
速率限制：1次请求/秒
适用场景：避免给服务器造成过大压力

Usage Examples

使用示例

Example 1: Basic Crawl

示例1：基础爬取

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com

Output:

console

[2026-01-10T01:17:27Z] INFO Starting crawl of: https://example.com
[2026-01-10T01:17:27Z] INFO Config: 20 workers, depth 2
Fetching sitemap URLs...
[Progress] Pages: 50/120 | Active jobs: 15 | Errors: 0
[Progress] Pages: 100/180 | Active jobs: 8 | Errors: 0

Crawl complete!
Pages crawled: 150
Duration: 8542ms
Results saved to: ./output/results.json
HTML report: ./output/index.html

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com

输出:

console

[2026-01-10T01:17:27Z] INFO 开始爬取: https://example.com
[2026-01-10T01:17:27Z] INFO 配置: 20个工作线程，深度2
正在获取站点地图URL...
[Progress] 页面数: 50/120 | 活跃任务数: 15 | 错误数: 0
[Progress] 页面数: 100/180 | 活跃任务数: 8 | 错误数: 0

爬取完成!
已爬取页面数: 150
耗时: 8542ms
结果已保存至: ./output/results.json
HTML报告: ./output/index.html

Example 2: Stealth Mode with Markdown Export

示例2：隐身模式+Markdown导出

bash

~/.claude/skills/web-crawler/bin/rcrawler https://docs.example.com \
  --stealth --markdown -f markdown -d 3

Use case: Content extraction for LLM/RAG pipelines Expected: Clean Markdown with frontmatter, anti-detection headers

bash

~/.claude/skills/web-crawler/bin/rcrawler https://docs.example.com \
  --stealth --markdown -f markdown -d 3

适用场景：为LLM/RAG管道提取内容 预期结果：带前置元数据的干净Markdown文件，防检测请求头

Example 3: Fast Scan

示例3：快速扫描

bash

~/.claude/skills/web-crawler/bin/rcrawler https://blog.example.com -p fast

Use case: Quick blog mapping Expected: 50 workers, depth 3, ~3-5 seconds for 100 pages

bash

~/.claude/skills/web-crawler/bin/rcrawler https://blog.example.com -p fast

适用场景：快速测绘博客站点 预期结果：50个工作线程，深度3，爬取100个页面约需3-5秒

Example 4: Multi-Format Export

示例4：多格式导出

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com \
  -f json,markdown,csv,links -o ./export

Use case: Export data in multiple formats simultaneously Expected: Generates results.json, results.md, results.csv, results.txt

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com \
  -f json,markdown,csv,links -o ./export

适用场景：同时以多种格式导出数据 预期结果：生成results.json、results.md、results.csv、results.txt

Example 5: Debug Mode

示例5：调试模式

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com --debug

Output: Detailed trace logs for troubleshooting

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com --debug

输出：用于故障排查的详细跟踪日志

Output Format

输出格式

Directory Structure

目录结构

text

./output/
├── results.json       # Structured crawl data
├── results.md         # LLM-ready Markdown (with --markdown)
├── results.html       # Interactive report
├── results.csv        # Spreadsheet format (with -f csv)
├── results.txt        # URL list (with -f links)
└── checkpoint.json    # Auto-saved state (every 30s)

text

./output/
├── results.json       # 结构化爬取数据
├── results.md         # 适配LLM的Markdown文件（启用--markdown时生成）
├── results.html       # 交互式报告
├── results.csv        # 电子表格格式（启用-f csv时生成）
├── results.txt        # URL列表（启用-f links时生成）
└── checkpoint.json    # 自动保存的状态（每30秒更新一次）

JSON Structure (results.json)

JSON结构（results.json）

json

{
  "stats": {
    "pages_found": 450,
    "pages_crawled": 450,
    "external_links": 23,
    "excluded_links": 89,
    "errors": 0,
    "start_time": "2026-01-10T01:00:00Z",
    "end_time": "2026-01-10T01:00:07Z",
    "duration": 7512
  },
  "results": [
    {
      "url": "https://example.com",
      "title": "Example Domain",
      "status_code": 200,
      "depth": 0,
      "links": ["https://example.com/page1", "..."],
      "crawled_at": "2026-01-10T01:00:01Z",
      "content_type": "text/html"
    }
  ]
}

json

{
  "stats": {
    "pages_found": 450,
    "pages_crawled": 450,
    "external_links": 23,
    "excluded_links": 89,
    "errors": 0,
    "start_time": "2026-01-10T01:00:00Z",
    "end_time": "2026-01-10T01:00:07Z",
    "duration": 7512
  },
  "results": [
    {
      "url": "https://example.com",
      "title": "Example Domain",
      "status_code": 200,
      "depth": 0,
      "links": ["https://example.com/page1", "..."],
      "crawled_at": "2026-01-10T01:00:01Z",
      "content_type": "text/html"
    }
  ]
}

HTML Report Features

HTML报告特性

Interactive dashboard with key statistics
Graph visualization using force-graph library
Node sizing based on link count (logarithmic scale)
Status color coding: Green (success), red (errors)
Hover tooltips: In-degree and out-degree information
Click to navigate: Opens page URL in new tab
Light/dark mode: Auto-detection via CSS
Collapsible sections: Reduces scroll for large crawls
Mobile responsive: Works on all devices

交互式仪表盘：展示关键统计数据
图形可视化：使用force-graph库
节点大小：基于链接数量（对数刻度）
状态颜色编码：绿色（成功）、红色（错误）
悬浮提示：入度和出度信息
点击导航：在新标签页打开页面URL
明暗模式：通过CSS自动检测
可折叠章节：减少大量爬取结果的滚动量
移动端适配：支持所有设备

Implementation Workflow

实现流程

When a user requests a crawl, follow these steps:

当用户请求爬取时，遵循以下步骤：

1. Parse Request

1. 解析请求

Extract from user message:

URL (required): Target website
Workers (optional): Number of concurrent workers
Depth (optional): Maximum crawl depth
Rate (optional): Requests per second
Profile (optional): fast/deep/gentle

从用户消息中提取：

URL（必填）：目标网站
工作线程数（可选）：并发工作线程数量
爬取深度（可选）：最大爬取深度
速率限制（可选）：每秒请求数
配置文件（可选）：fast/deep/gentle

2. Validate Input

2. 验证输入

Check URL format (add https:// if missing)
Validate workers range (1-50)
Validate depth (1-10 recommended)
Validate rate (0.1-20.0 recommended)

检查URL格式（如果缺失则添加https://）
验证工作线程数范围（1-50）
验证爬取深度（推荐1-10）
验证速率限制（推荐0.1-20.0）

3. Build Command

3. 构建命令

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> \
  -w <workers> \
  -d <depth> \
  -r <rate> \
  [--debug] \
  [-o <output>]

bash

~/.claude/skills/web-crawler/bin/rcrawler <URL> \
  -w <workers> \
  -d <depth> \
  -r <rate> \
  [--debug] \
  [-o <output>]

4. Execute Crawl

4. 执行爬取

Use Bash tool to run the command:

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com -w 20 -d 2

使用Bash工具运行命令：

bash

~/.claude/skills/web-crawler/bin/rcrawler https://example.com -w 20 -d 2

5. Monitor Progress

5. 监控进度

Watch for progress updates in output:

[Progress] Pages: X/Y | Active jobs: Z | Errors: N

Updates appear every 5 seconds
Shows real-time crawl status

关注输出中的进度更新：

[Progress] 页面数: X/Y | 活跃任务数: Z | 错误数: N

每5秒更新一次
显示实时爬取状态

6. Report Results

6. 报告结果

When crawl completes, inform user:

Number of pages crawled
Duration in seconds or minutes
Path to results:
```
./output/results.json
```
Path to HTML report:
```
./output/index.html
```
Offer to open HTML report:
```
open ./output/index.html
```

爬取完成后，告知用户：

已爬取页面数
耗时（秒或分钟）
结果路径：
```
./output/results.json
```
HTML报告路径：
```
./output/index.html
```
可提供打开HTML报告的命令：
```
open ./output/index.html
```

Natural Language Parsing

自然语言解析

Example User Requests

用户请求示例

Request: "Crawl docs.example.com" Parse: URL = https://docs.example.com, use defaults Command:

rcrawler https://docs.example.com

Request: "Quick scan of blog.example.com" Parse: URL = blog.example.com, profile = fast Command:

rcrawler https://blog.example.com -p fast

Request: "Deep crawl of api-docs.example.com with 40 workers" Parse: URL = api-docs.example.com, workers = 40, depth = deep Command:

rcrawler https://api-docs.example.com -w 40 -d 5

Request: "Crawl example.com carefully, don't overload their server" Parse: URL = example.com, profile = gentle Command:

rcrawler https://example.com -p gentle

Request: "Map the structure of help.example.com" Parse: URL = help.example.com, depth = moderate Command:

rcrawler https://help.example.com -d 3

请求："爬取docs.example.com" 解析结果：URL = https://docs.example.com，使用默认配置命令：

rcrawler https://docs.example.com

请求："快速扫描blog.example.com" 解析结果：URL = blog.example.com，配置文件 = fast 命令：

rcrawler https://blog.example.com -p fast

请求："深度爬取api-docs.example.com，使用40个工作线程" 解析结果：URL = api-docs.example.com，工作线程数 = 40，爬取深度 = 深度模式命令：

rcrawler https://api-docs.example.com -w 40 -d 5

请求："小心爬取example.com，不要给服务器造成过大压力" 解析结果：URL = example.com，配置文件 = gentle 命令：

rcrawler https://example.com -p gentle

请求："测绘help.example.com的站点结构" 解析结果：URL = help.example.com，爬取深度 = 中等命令：

rcrawler https://help.example.com -d 3

Error Handling

错误处理

Binary Not Found

二进制文件未找到

console

undefined

console

undefined

Check if binary exists

检查二进制文件是否存在

ls ~/.claude/skills/web-crawler/bin/rcrawler

If missing, build it

如果缺失，重新构建

cd ~/.claude/skills/web-crawler/scripts && cargo build --release

undefined

cd ~/.claude/skills/web-crawler/scripts && cargo build --release

undefined

Crawl Failures

爬取失败

Network errors:

Verify URL is accessible:
```
curl -I <URL>
```
Check if site is down or blocking crawlers
Try with lower rate:
```
-r 1
```

robots.txt blocking:

Crawler respects robots.txt by default
Check rules:
```
curl <URL>/robots.txt
```
Inform user of restrictions

Timeout errors:

Increase timeout in code (default 30s)
Reduce workers:
```
-w 10
```
Lower rate limit:
```
-r 1
```

Too many errors:

Enable debug mode:
```
--debug
```
Check specific failing URLs
May need to exclude certain patterns

网络错误:

验证URL是否可访问：
```
curl -I <URL>
```
检查站点是否下线或阻止爬虫
尝试降低速率限制：
```
-r 1
```

robots.txt阻止:

爬虫默认遵循robots.txt规则
检查规则：
```
curl <URL>/robots.txt
```
告知用户相关限制

超时错误:

在代码中增加超时时间（默认30秒）
减少工作线程数：
```
-w 10
```
降低速率限制：
```
-r 1
```

错误过多:

启用调试模式：
```
--debug
```
检查具体失败的URL
可能需要排除某些特定模式

Performance Benchmarks

性能基准测试

Test: adonisjs.com

测试站点：adonisjs.com

Pages: 450
Duration: 7.5 seconds
Throughput: 60 pages/sec
Workers: 20
Depth: 2

页面数：450
耗时：7.5秒
吞吐量：每秒60个页面
工作线程数：20
爬取深度：2

Test: rust-lang.org

测试站点：rust-lang.org

Pages: 16
Duration: 3.9 seconds
Workers: 10
Depth: 1

页面数：16
耗时：3.9秒
工作线程数：10
爬取深度：1

Test: example.com

测试站点：example.com

Pages: 2
Duration: 2.7 seconds
Workers: 5
Depth: 1

页面数：2
耗时：2.7秒
工作线程数：5
爬取深度：1

Technical Architecture

技术架构

Core Components

核心组件

CrawlEngine (src/crawler/engine.rs)
- Worker pool management
- Job queue coordination
- Shutdown signaling
- Statistics tracking
RobotsChecker (src/crawler/robots.rs)
- Per-domain caching
- Rule validation
- Fallback on errors
RateLimiter (src/crawler/rate_limiter.rs)
- Token bucket algorithm
- Configurable rate
- Shared across workers
UrlFilter (src/utils/filters.rs)
- Regex-based filtering
- Include/exclude patterns
- Default exclusions
HtmlParser (src/parser/html.rs)
- CSS selector queries
- Title extraction
- Link discovery
SitemapParser (src/parser/sitemap.rs)
- XML parsing
- Index traversal
- URL extraction

CrawlEngine（src/crawler/engine.rs）
- 工作线程池管理
- 任务队列协调
- 停机信号处理
- 统计数据跟踪
RobotsChecker（src/crawler/robots.rs）
- 按域名缓存
- 规则验证
- 错误时的回退处理
RateLimiter（src/crawler/rate_limiter.rs）
- 令牌桶算法
- 可配置速率
- 跨工作线程共享
UrlFilter（src/utils/filters.rs）
- 基于正则表达式的过滤
- 包含/排除模式
- 默认排除规则
HtmlParser（src/parser/html.rs）
- CSS选择器查询
- 标题提取
- 链接发现
SitemapParser（src/parser/sitemap.rs）
- XML解析
- 索引遍历
- URL提取

Key Dependencies

关键依赖

tokio: Async runtime (multi-threaded)
reqwest: HTTP client (connection pooling)
scraper: HTML parsing (CSS selectors)
quick-xml: Sitemap parsing
governor: Rate limiting (token bucket)
tracing: Structured logging
dashmap: Concurrent HashMap
robotstxt: robots.txt compliance
clap: CLI argument parsing
serde/serde_json: Serialization

tokio：异步运行时（多线程）
reqwest：HTTP客户端（连接池）
scraper：HTML解析（CSS选择器）
quick-xml：站点地图解析
governor：速率限制（令牌桶）
tracing：结构化日志
dashmap：并发HashMap
robotstxt：robots.txt合规性
clap：CLI参数解析
serde/serde_json：序列化

Tips & Best Practices

技巧与最佳实践

1. Start with Default Settings

1. 从默认配置开始

First crawl should use defaults to understand site structure.

首次爬取应使用默认配置，以了解站点结构。

2. Use Profiles for Common Scenarios

2. 针对常见场景使用预定义配置文件

fast: Quick overviews
deep: Comprehensive analysis
gentle: Respectful crawling

fast：快速概览
deep：全面分析
gentle：友好爬取

3. Monitor Progress

3. 监控进度

Watch the

[Progress]

lines to ensure crawl is progressing.

关注

[Progress]

行，确保爬取正常进行。

4. Check HTML Report

4. 查看HTML报告

Interactive visualization helps understand site structure better than JSON.

交互式可视化比JSON更有助于理解站点结构。

5. Respect Rate Limits

5. 遵守速率限制

Default 2 req/s is safe for most sites. Increase cautiously.

默认每秒2次请求对大多数站点来说是安全的，谨慎提高速率。

6. Enable Debug for Issues

6. 遇到问题时启用调试模式

--debug

flag provides detailed logs for troubleshooting.

--debug

标志提供详细日志用于故障排查。

7. Review robots.txt

7. 查看robots.txt

Check

<URL>/robots.txt

to understand crawling restrictions.

检查

<URL>/robots.txt

以了解爬取限制。

8. Use Custom Output for Multiple Crawls

8. 多爬取任务使用自定义输出目录

Avoid overwriting results with

-o

flag.

使用

-o

标志避免覆盖之前的结果。

Future Enhancements (V2.0)

未来规划（V2.0）

Checkpoint resume: Full integration of checkpoint system
Per-domain rate limiting: Different rates for different domains
JavaScript rendering: chromiumoxide for dynamic sites
Distributed crawling: Redis-based job queue
Advanced analytics: SEO analysis, link quality scoring

检查点恢复：完整集成检查点系统
按域名速率限制：为不同域名设置不同速率
JavaScript渲染：使用chromiumoxide处理动态站点
分布式爬取：基于Redis的任务队列
高级分析：SEO分析、链接质量评分

Support & Resources

支持与资源

GitHub Repository: leobrival/rcrawler

Binary:

~/.claude/skills/web-crawler/bin/rcrawler

Skill Documentation: This file (SKILL.md)
Quick Start: README.md (this repository)
Development Guide: DEVELOPMENT.md

Version: 1.0.0 Status: Production Ready

GitHub仓库：leobrival/rcrawler

二进制文件：

~/.claude/skills/web-crawler/bin/rcrawler

工具文档：本文件（SKILL.md）
快速开始：README.md（仓库中）
开发指南：DEVELOPMENT.md

版本：1.0.0 状态：可用于生产环境