web-crawler
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRust Web Crawler (rcrawler)
Rust网络爬虫(rcrawler)
High-performance web crawler built in pure Rust with
production-grade features for fast, reliable site crawling.
一款基于纯Rust开发的高性能网络爬虫,具备生产级特性,可实现快速、可靠的站点爬取。
When to Use This Skill
何时使用本工具
Use this skill when the user requests:
- Web crawling or site mapping
- Sitemap discovery and analysis
- Link extraction and validation
- Site structure visualization
- robots.txt compliance checking
- Performance-critical web scraping
- Generating interactive web reports with graph visualization
当用户提出以下需求时,可使用本工具:
- 网络爬取或站点测绘
- 站点地图发现与分析
- 链接提取与验证
- 站点结构可视化
- robots.txt合规性检查
- 对性能要求较高的网页抓取
- 生成带图形可视化的交互式网页报告
Core Capabilities
核心功能
🚀 Performance
🚀 性能表现
- 60+ pages/sec throughput with async Tokio runtime
- <50ms startup time - Near-instant initialization
- ~50MB memory usage - Efficient resource consumption
- 5.4 MB binary - Single executable, no dependencies
- 每秒60+页面:基于异步Tokio runtime的吞吐量
- <50ms启动时间:近乎即时的初始化速度
- 约50MB内存占用:高效的资源消耗
- 5.4 MB二进制文件:单一可执行文件,无依赖项
🤖 Intelligence
🤖 智能特性
- Sitemap discovery: Automatically finds and parses sitemap.xml (3 standard locations)
- robots.txt compliance: Respects crawling rules with per-domain caching
- Smart filtering: Auto-excludes images, CSS, JS, PDFs by default
- Domain auto-detection: Extracts and restricts to base domain automatically
- 站点地图发现:自动查找并解析sitemap.xml(支持3个标准位置)
- robots.txt合规:遵循爬取规则,支持按域名缓存
- 智能过滤:默认自动排除图片、CSS、JS、PDF文件
- 域名自动检测:自动提取并限制在基础域名范围内爬取
🔒 Safety
🔒 安全保障
- Rate limiting: Token bucket algorithm (default 2 req/s)
- Configurable timeout: 30 second default
- Memory safe: Rust's ownership system prevents crashes
- Graceful shutdown: 2-second grace period for pending requests
- 速率限制:采用令牌桶算法(默认每秒2次请求)
- 可配置超时:默认30秒
- 内存安全:Rust的所有权系统可防止崩溃
- 优雅停机:为待处理请求提供2秒宽限期
📊 Output
📊 输出功能
- Multiple formats: JSON, Markdown, HTML, CSV, Links, Text
- LLM-ready Markdown: Clean content with YAML frontmatter
- Interactive HTML report: Dashboard with graph visualization
- Stealth mode: User-agent rotation and realistic headers
- Content filtering: Remove nav, ads, scripts for clean data
- Real-time progress: Updates every 5 seconds during crawl
- 多格式支持:JSON、Markdown、HTML、CSV、链接列表、纯文本
- 适配LLM的Markdown:带YAML前置元数据的干净内容
- 交互式HTML报告:带图形可视化的仪表盘
- 隐身模式:User-Agent轮换与真实请求头
- 内容过滤:移除导航栏、广告、脚本以获取干净数据
- 实时进度更新:爬取过程中每5秒更新一次进度
📝 Monitoring
📝 监控功能
- Structured logging: tracing with timestamps and log levels
- Progress tracking:
[Progress] Pages: X/Y | Active jobs: Z | Errors: N - Detailed statistics: Pages found, crawled, external links, errors, duration
- 结构化日志:带时间戳和日志级别的tracing日志
- 进度跟踪:
[Progress] 页面数: X/Y | 活跃任务数: Z | 错误数: N - 详细统计:已发现页面数、已爬取页面数、外部链接数、错误数、耗时
Installation & Setup
安装与设置
Binary Location
二进制文件位置
bash
~/.claude/skills/web-crawler/bin/rcrawlerbash
~/.claude/skills/web-crawler/bin/rcrawlerBuild from Source
从源码构建
bash
undefinedbash
undefinedClone the repository
克隆仓库
git clone https://github.com/leobrival/rcrawler.git
cd rcrawler
git clone https://github.com/leobrival/rcrawler.git
cd rcrawler
Build release binary
构建发布版二进制文件
cargo build --release
cargo build --release
Copy to skill directory
复制到工具目录
cp target/release/rcrawler ~/.claude/skills/web-crawler/bin/
Build time: ~2 minutes
Binary size: 5.4 MBcp target/release/rcrawler ~/.claude/skills/web-crawler/bin/
构建时间:约2分钟
二进制文件大小:5.4 MBCommand Line Interface
命令行界面
Basic Syntax
基本语法
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> [OPTIONS]bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> [OPTIONS]Options
选项
Core Options:
- : Number of concurrent workers (default: 20, range: 1-50)
-w, --workers <N> - : Maximum crawl depth (default: 2)
-d, --depth <N> - : Rate limit in requests/second (default: 2.0)
-r, --rate <N>
Configuration:
- : Use predefined profile (fast/deep/gentle)
-p, --profile <NAME> - : Restrict to specific domain (auto-detected from URL)
--domain <DOMAIN> - : Custom output directory (default: ./output)
-o, --output <PATH>
Features:
- : Enable/disable sitemap discovery (default: true)
-s, --sitemap - : Enable stealth mode with user-agent rotation
--stealth - : Convert HTML to LLM-ready Markdown with frontmatter
--markdown - : Enable content filtering (remove nav, ads, scripts)
--filter-content - : Enable debug logging with detailed trace information
--debug - : Resume from checkpoint if available
--resume
Output:
- : Output formats (json,markdown,html,csv,links,text)
-f, --formats <LIST>
核心选项:
- :并发工作线程数(默认:20,范围:1-50)
-w, --workers <N> - :最大爬取深度(默认:2)
-d, --depth <N> - :速率限制(每秒请求数,默认:2.0)
-r, --rate <N>
配置选项:
- :使用预定义配置文件(fast/deep/gentle)
-p, --profile <NAME> - :限制爬取的特定域名(默认从URL自动检测)
--domain <DOMAIN> - :自定义输出目录(默认:./output)
-o, --output <PATH>
功能选项:
- :启用/禁用站点地图发现(默认:启用)
-s, --sitemap - :启用隐身模式,轮换User-Agent
--stealth - :将HTML转换为适配LLM的Markdown(带前置元数据)
--markdown - :启用内容过滤(移除导航栏、广告、脚本)
--filter-content - :启用调试日志,输出详细跟踪信息
--debug - :从检查点恢复爬取(如果可用)
--resume
输出格式选项:
- :输出格式(json,markdown,html,csv,links,text)
-f, --formats <LIST>
Profiles
预定义配置文件
Fast Profile (Quick Mapping)
快速配置(快速测绘)
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> -p fast- Workers: 50
- Depth: 3
- Rate: 10 req/s
- Use case: Quick site structure overview
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> -p fast- 工作线程数:50
- 爬取深度:3
- 速率限制:10次请求/秒
- 适用场景:快速获取站点结构概览
Deep Profile (Comprehensive Crawl)
深度配置(全面爬取)
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> -p deep- Workers: 20
- Depth: 10
- Rate: 3 req/s
- Use case: Complete site analysis
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> -p deep- 工作线程数:20
- 爬取深度:10
- 速率限制:3次请求/秒
- 适用场景:完整站点分析
Gentle Profile (Server-Friendly)
温和配置(服务器友好)
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> -p gentle- Workers: 5
- Depth: 5
- Rate: 1 req/s
- Use case: Respecting server resources
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> -p gentle- 工作线程数:5
- 爬取深度:5
- 速率限制:1次请求/秒
- 适用场景:避免给服务器造成过大压力
Usage Examples
使用示例
Example 1: Basic Crawl
示例1:基础爬取
bash
~/.claude/skills/web-crawler/bin/rcrawler https://example.comOutput:
console
[2026-01-10T01:17:27Z] INFO Starting crawl of: https://example.com
[2026-01-10T01:17:27Z] INFO Config: 20 workers, depth 2
Fetching sitemap URLs...
[Progress] Pages: 50/120 | Active jobs: 15 | Errors: 0
[Progress] Pages: 100/180 | Active jobs: 8 | Errors: 0
Crawl complete!
Pages crawled: 150
Duration: 8542ms
Results saved to: ./output/results.json
HTML report: ./output/index.htmlbash
~/.claude/skills/web-crawler/bin/rcrawler https://example.com输出:
console
[2026-01-10T01:17:27Z] INFO 开始爬取: https://example.com
[2026-01-10T01:17:27Z] INFO 配置: 20个工作线程,深度2
正在获取站点地图URL...
[Progress] 页面数: 50/120 | 活跃任务数: 15 | 错误数: 0
[Progress] 页面数: 100/180 | 活跃任务数: 8 | 错误数: 0
爬取完成!
已爬取页面数: 150
耗时: 8542ms
结果已保存至: ./output/results.json
HTML报告: ./output/index.htmlExample 2: Stealth Mode with Markdown Export
示例2:隐身模式+Markdown导出
bash
~/.claude/skills/web-crawler/bin/rcrawler https://docs.example.com \
--stealth --markdown -f markdown -d 3Use case: Content extraction for LLM/RAG pipelines
Expected: Clean Markdown with frontmatter, anti-detection headers
bash
~/.claude/skills/web-crawler/bin/rcrawler https://docs.example.com \
--stealth --markdown -f markdown -d 3适用场景:为LLM/RAG管道提取内容
预期结果:带前置元数据的干净Markdown文件,防检测请求头
Example 3: Fast Scan
示例3:快速扫描
bash
~/.claude/skills/web-crawler/bin/rcrawler https://blog.example.com -p fastUse case: Quick blog mapping
Expected: 50 workers, depth 3, ~3-5 seconds for 100 pages
bash
~/.claude/skills/web-crawler/bin/rcrawler https://blog.example.com -p fast适用场景:快速测绘博客站点
预期结果:50个工作线程,深度3,爬取100个页面约需3-5秒
Example 4: Multi-Format Export
示例4:多格式导出
bash
~/.claude/skills/web-crawler/bin/rcrawler https://example.com \
-f json,markdown,csv,links -o ./exportUse case: Export data in multiple formats simultaneously
Expected: Generates results.json, results.md, results.csv, results.txt
bash
~/.claude/skills/web-crawler/bin/rcrawler https://example.com \
-f json,markdown,csv,links -o ./export适用场景:同时以多种格式导出数据
预期结果:生成results.json、results.md、results.csv、results.txt
Example 5: Debug Mode
示例5:调试模式
bash
~/.claude/skills/web-crawler/bin/rcrawler https://example.com --debugOutput: Detailed trace logs for troubleshooting
bash
~/.claude/skills/web-crawler/bin/rcrawler https://example.com --debug输出:用于故障排查的详细跟踪日志
Output Format
输出格式
Directory Structure
目录结构
text
./output/
├── results.json # Structured crawl data
├── results.md # LLM-ready Markdown (with --markdown)
├── results.html # Interactive report
├── results.csv # Spreadsheet format (with -f csv)
├── results.txt # URL list (with -f links)
└── checkpoint.json # Auto-saved state (every 30s)text
./output/
├── results.json # 结构化爬取数据
├── results.md # 适配LLM的Markdown文件(启用--markdown时生成)
├── results.html # 交互式报告
├── results.csv # 电子表格格式(启用-f csv时生成)
├── results.txt # URL列表(启用-f links时生成)
└── checkpoint.json # 自动保存的状态(每30秒更新一次)JSON Structure (results.json)
JSON结构(results.json)
json
{
"stats": {
"pages_found": 450,
"pages_crawled": 450,
"external_links": 23,
"excluded_links": 89,
"errors": 0,
"start_time": "2026-01-10T01:00:00Z",
"end_time": "2026-01-10T01:00:07Z",
"duration": 7512
},
"results": [
{
"url": "https://example.com",
"title": "Example Domain",
"status_code": 200,
"depth": 0,
"links": ["https://example.com/page1", "..."],
"crawled_at": "2026-01-10T01:00:01Z",
"content_type": "text/html"
}
]
}json
{
"stats": {
"pages_found": 450,
"pages_crawled": 450,
"external_links": 23,
"excluded_links": 89,
"errors": 0,
"start_time": "2026-01-10T01:00:00Z",
"end_time": "2026-01-10T01:00:07Z",
"duration": 7512
},
"results": [
{
"url": "https://example.com",
"title": "Example Domain",
"status_code": 200,
"depth": 0,
"links": ["https://example.com/page1", "..."],
"crawled_at": "2026-01-10T01:00:01Z",
"content_type": "text/html"
}
]
}HTML Report Features
HTML报告特性
- Interactive dashboard with key statistics
- Graph visualization using force-graph library
- Node sizing based on link count (logarithmic scale)
- Status color coding: Green (success), red (errors)
- Hover tooltips: In-degree and out-degree information
- Click to navigate: Opens page URL in new tab
- Light/dark mode: Auto-detection via CSS
- Collapsible sections: Reduces scroll for large crawls
- Mobile responsive: Works on all devices
- 交互式仪表盘:展示关键统计数据
- 图形可视化:使用force-graph库
- 节点大小:基于链接数量(对数刻度)
- 状态颜色编码:绿色(成功)、红色(错误)
- 悬浮提示:入度和出度信息
- 点击导航:在新标签页打开页面URL
- 明暗模式:通过CSS自动检测
- 可折叠章节:减少大量爬取结果的滚动量
- 移动端适配:支持所有设备
Implementation Workflow
实现流程
When a user requests a crawl, follow these steps:
当用户请求爬取时,遵循以下步骤:
1. Parse Request
1. 解析请求
Extract from user message:
- URL (required): Target website
- Workers (optional): Number of concurrent workers
- Depth (optional): Maximum crawl depth
- Rate (optional): Requests per second
- Profile (optional): fast/deep/gentle
从用户消息中提取:
- URL(必填):目标网站
- 工作线程数(可选):并发工作线程数量
- 爬取深度(可选):最大爬取深度
- 速率限制(可选):每秒请求数
- 配置文件(可选):fast/deep/gentle
2. Validate Input
2. 验证输入
- Check URL format (add https:// if missing)
- Validate workers range (1-50)
- Validate depth (1-10 recommended)
- Validate rate (0.1-20.0 recommended)
- 检查URL格式(如果缺失则添加https://)
- 验证工作线程数范围(1-50)
- 验证爬取深度(推荐1-10)
- 验证速率限制(推荐0.1-20.0)
3. Build Command
3. 构建命令
bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> \
-w <workers> \
-d <depth> \
-r <rate> \
[--debug] \
[-o <output>]bash
~/.claude/skills/web-crawler/bin/rcrawler <URL> \
-w <workers> \
-d <depth> \
-r <rate> \
[--debug] \
[-o <output>]4. Execute Crawl
4. 执行爬取
Use Bash tool to run the command:
bash
~/.claude/skills/web-crawler/bin/rcrawler https://example.com -w 20 -d 2使用Bash工具运行命令:
bash
~/.claude/skills/web-crawler/bin/rcrawler https://example.com -w 20 -d 25. Monitor Progress
5. 监控进度
Watch for progress updates in output:
[Progress] Pages: X/Y | Active jobs: Z | Errors: N- Updates appear every 5 seconds
- Shows real-time crawl status
关注输出中的进度更新:
[Progress] 页面数: X/Y | 活跃任务数: Z | 错误数: N- 每5秒更新一次
- 显示实时爬取状态
6. Report Results
6. 报告结果
When crawl completes, inform user:
- Number of pages crawled
- Duration in seconds or minutes
- Path to results:
./output/results.json - Path to HTML report:
./output/index.html - Offer to open HTML report:
open ./output/index.html
爬取完成后,告知用户:
- 已爬取页面数
- 耗时(秒或分钟)
- 结果路径:
./output/results.json - HTML报告路径:
./output/index.html - 可提供打开HTML报告的命令:
open ./output/index.html
Natural Language Parsing
自然语言解析
Example User Requests
用户请求示例
Request: "Crawl docs.example.com"
Parse: URL = https://docs.example.com, use defaults
Command:
rcrawler https://docs.example.comRequest: "Quick scan of blog.example.com"
Parse: URL = blog.example.com, profile = fast
Command:
rcrawler https://blog.example.com -p fastRequest: "Deep crawl of api-docs.example.com with 40 workers"
Parse: URL = api-docs.example.com, workers = 40, depth = deep
Command:
rcrawler https://api-docs.example.com -w 40 -d 5Request: "Crawl example.com carefully, don't overload their server"
Parse: URL = example.com, profile = gentle
Command:
rcrawler https://example.com -p gentleRequest: "Map the structure of help.example.com"
Parse: URL = help.example.com, depth = moderate
Command:
rcrawler https://help.example.com -d 3请求:"爬取docs.example.com"
解析结果:URL = https://docs.example.com,使用默认配置
命令:
rcrawler https://docs.example.com请求:"快速扫描blog.example.com"
解析结果:URL = blog.example.com,配置文件 = fast
命令:
rcrawler https://blog.example.com -p fast请求:"深度爬取api-docs.example.com,使用40个工作线程"
解析结果:URL = api-docs.example.com,工作线程数 = 40,爬取深度 = 深度模式
命令:
rcrawler https://api-docs.example.com -w 40 -d 5请求:"小心爬取example.com,不要给服务器造成过大压力"
解析结果:URL = example.com,配置文件 = gentle
命令:
rcrawler https://example.com -p gentle请求:"测绘help.example.com的站点结构"
解析结果:URL = help.example.com,爬取深度 = 中等
命令:
rcrawler https://help.example.com -d 3Error Handling
错误处理
Binary Not Found
二进制文件未找到
console
undefinedconsole
undefinedCheck if binary exists
检查二进制文件是否存在
ls ~/.claude/skills/web-crawler/bin/rcrawler
ls ~/.claude/skills/web-crawler/bin/rcrawler
If missing, build it
如果缺失,重新构建
cd ~/.claude/skills/web-crawler/scripts && cargo build --release
undefinedcd ~/.claude/skills/web-crawler/scripts && cargo build --release
undefinedCrawl Failures
爬取失败
Network errors:
- Verify URL is accessible:
curl -I <URL> - Check if site is down or blocking crawlers
- Try with lower rate:
-r 1
robots.txt blocking:
- Crawler respects robots.txt by default
- Check rules:
curl <URL>/robots.txt - Inform user of restrictions
Timeout errors:
- Increase timeout in code (default 30s)
- Reduce workers:
-w 10 - Lower rate limit:
-r 1
Too many errors:
- Enable debug mode:
--debug - Check specific failing URLs
- May need to exclude certain patterns
网络错误:
- 验证URL是否可访问:
curl -I <URL> - 检查站点是否下线或阻止爬虫
- 尝试降低速率限制:
-r 1
robots.txt阻止:
- 爬虫默认遵循robots.txt规则
- 检查规则:
curl <URL>/robots.txt - 告知用户相关限制
超时错误:
- 在代码中增加超时时间(默认30秒)
- 减少工作线程数:
-w 10 - 降低速率限制:
-r 1
错误过多:
- 启用调试模式:
--debug - 检查具体失败的URL
- 可能需要排除某些特定模式
Performance Benchmarks
性能基准测试
Test: adonisjs.com
测试站点:adonisjs.com
- Pages: 450
- Duration: 7.5 seconds
- Throughput: 60 pages/sec
- Workers: 20
- Depth: 2
- 页面数:450
- 耗时:7.5秒
- 吞吐量:每秒60个页面
- 工作线程数:20
- 爬取深度:2
Test: rust-lang.org
测试站点:rust-lang.org
- Pages: 16
- Duration: 3.9 seconds
- Workers: 10
- Depth: 1
- 页面数:16
- 耗时:3.9秒
- 工作线程数:10
- 爬取深度:1
Test: example.com
测试站点:example.com
- Pages: 2
- Duration: 2.7 seconds
- Workers: 5
- Depth: 1
- 页面数:2
- 耗时:2.7秒
- 工作线程数:5
- 爬取深度:1
Technical Architecture
技术架构
Core Components
核心组件
-
CrawlEngine (src/crawler/engine.rs)
- Worker pool management
- Job queue coordination
- Shutdown signaling
- Statistics tracking
-
RobotsChecker (src/crawler/robots.rs)
- Per-domain caching
- Rule validation
- Fallback on errors
-
RateLimiter (src/crawler/rate_limiter.rs)
- Token bucket algorithm
- Configurable rate
- Shared across workers
-
UrlFilter (src/utils/filters.rs)
- Regex-based filtering
- Include/exclude patterns
- Default exclusions
-
HtmlParser (src/parser/html.rs)
- CSS selector queries
- Title extraction
- Link discovery
-
SitemapParser (src/parser/sitemap.rs)
- XML parsing
- Index traversal
- URL extraction
-
CrawlEngine(src/crawler/engine.rs)
- 工作线程池管理
- 任务队列协调
- 停机信号处理
- 统计数据跟踪
-
RobotsChecker(src/crawler/robots.rs)
- 按域名缓存
- 规则验证
- 错误时的回退处理
-
RateLimiter(src/crawler/rate_limiter.rs)
- 令牌桶算法
- 可配置速率
- 跨工作线程共享
-
UrlFilter(src/utils/filters.rs)
- 基于正则表达式的过滤
- 包含/排除模式
- 默认排除规则
-
HtmlParser(src/parser/html.rs)
- CSS选择器查询
- 标题提取
- 链接发现
-
SitemapParser(src/parser/sitemap.rs)
- XML解析
- 索引遍历
- URL提取
Key Dependencies
关键依赖
- tokio: Async runtime (multi-threaded)
- reqwest: HTTP client (connection pooling)
- scraper: HTML parsing (CSS selectors)
- quick-xml: Sitemap parsing
- governor: Rate limiting (token bucket)
- tracing: Structured logging
- dashmap: Concurrent HashMap
- robotstxt: robots.txt compliance
- clap: CLI argument parsing
- serde/serde_json: Serialization
- tokio:异步运行时(多线程)
- reqwest:HTTP客户端(连接池)
- scraper:HTML解析(CSS选择器)
- quick-xml:站点地图解析
- governor:速率限制(令牌桶)
- tracing:结构化日志
- dashmap:并发HashMap
- robotstxt:robots.txt合规性
- clap:CLI参数解析
- serde/serde_json:序列化
Tips & Best Practices
技巧与最佳实践
1. Start with Default Settings
1. 从默认配置开始
First crawl should use defaults to understand site structure.
首次爬取应使用默认配置,以了解站点结构。
2. Use Profiles for Common Scenarios
2. 针对常见场景使用预定义配置文件
- fast: Quick overviews
- deep: Comprehensive analysis
- gentle: Respectful crawling
- fast:快速概览
- deep:全面分析
- gentle:友好爬取
3. Monitor Progress
3. 监控进度
Watch the lines to ensure crawl is progressing.
[Progress]关注行,确保爬取正常进行。
[Progress]4. Check HTML Report
4. 查看HTML报告
Interactive visualization helps understand site structure better than JSON.
交互式可视化比JSON更有助于理解站点结构。
5. Respect Rate Limits
5. 遵守速率限制
Default 2 req/s is safe for most sites. Increase cautiously.
默认每秒2次请求对大多数站点来说是安全的,谨慎提高速率。
6. Enable Debug for Issues
6. 遇到问题时启用调试模式
--debug--debug7. Review robots.txt
7. 查看robots.txt
Check to understand crawling restrictions.
<URL>/robots.txt检查以了解爬取限制。
<URL>/robots.txt8. Use Custom Output for Multiple Crawls
8. 多爬取任务使用自定义输出目录
Avoid overwriting results with flag.
-o使用标志避免覆盖之前的结果。
-oFuture Enhancements (V2.0)
未来规划(V2.0)
- Checkpoint resume: Full integration of checkpoint system
- Per-domain rate limiting: Different rates for different domains
- JavaScript rendering: chromiumoxide for dynamic sites
- Distributed crawling: Redis-based job queue
- Advanced analytics: SEO analysis, link quality scoring
- 检查点恢复:完整集成检查点系统
- 按域名速率限制:为不同域名设置不同速率
- JavaScript渲染:使用chromiumoxide处理动态站点
- 分布式爬取:基于Redis的任务队列
- 高级分析:SEO分析、链接质量评分
Support & Resources
支持与资源
- GitHub Repository: leobrival/rcrawler
- Binary:
~/.claude/skills/web-crawler/bin/rcrawler - Skill Documentation: This file (SKILL.md)
- Quick Start: README.md (this repository)
- Development Guide: DEVELOPMENT.md
Version: 1.0.0
Status: Production Ready
- GitHub仓库:leobrival/rcrawler
- 二进制文件:
~/.claude/skills/web-crawler/bin/rcrawler - 工具文档:本文件(SKILL.md)
- 快速开始:README.md(仓库中)
- 开发指南:DEVELOPMENT.md
版本:1.0.0
状态:可用于生产环境