web-fetch
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWeb Content Fetching
网页内容抓取
Fetch web content using with CSS selectors for clean, complete markdown output.
curl | html2markdown使用结合CSS选择器来抓取网页内容,输出整洁完整的Markdown格式。
curl | html2markdownQuick Usage (Known Sites)
快速使用(已知站点)
Use site-specific selectors for best results:
bash
undefined使用站点专属选择器以获得最佳效果:
bash
undefinedAnthropic docs
Anthropic 文档
curl -s "<url>" | html2markdown --include-selector "#content-container"
curl -s "<url>" | html2markdown --include-selector "#content-container"
MDN Web Docs
MDN Web 文档
curl -s "<url>" | html2markdown --include-selector "article"
curl -s "<url>" | html2markdown --include-selector "article"
GitHub docs
GitHub 文档
curl -s "<url>" | html2markdown --include-selector "article" --exclude-selector "nav,.sidebar"
curl -s "<url>" | html2markdown --include-selector "article" --exclude-selector "nav,.sidebar"
Generic article pages
通用文章页面
curl -s "<url>" | html2markdown --include-selector "article,main,[role=main]" --exclude-selector "nav,header,footer"
undefinedcurl -s "<url>" | html2markdown --include-selector "article,main,[role=main]" --exclude-selector "nav,header,footer"
undefinedSite Patterns
站点模式
| Site | Include Selector | Exclude Selector |
|---|---|---|
| platform.claude.com | | - |
| docs.anthropic.com | | - |
| developer.mozilla.org | | - |
| github.com (docs) | | |
| Generic | | |
| 站点 | 包含选择器 | 排除选择器 |
|---|---|---|
| platform.claude.com | | - |
| docs.anthropic.com | | - |
| developer.mozilla.org | | - |
| github.com (文档) | | |
| 通用 | | |
Universal Fallback (Unknown Sites)
通用回退方案(未知站点)
For sites without known patterns, use the Bun script which auto-detects content:
bash
bun ~/.claude/skills/web-fetch/fetch.ts "<url>"对于没有已知模式的站点,使用Bun脚本自动检测内容:
bash
bun ~/.claude/skills/web-fetch/fetch.ts "<url>"Setup (one-time)
一次性设置
bash
cd ~/.claude/skills/web-fetch && bun installbash
cd ~/.claude/skills/web-fetch && bun installFinding the Right Selector
找到合适的选择器
When a site isn't in the patterns list:
bash
undefined当站点不在模式列表中时:
bash
undefinedCheck what content containers exist
检查存在哪些内容容器
curl -s "<url>" | grep -o '<article[^>]>|<main[^>]>|id="[^"]content[^"]"' | head -10
curl -s "<url>" | grep -o '<article[^>]>|<main[^>]>|id="[^"]content[^"]"' | head -10
Test a selector
测试选择器
curl -s "<url>" | html2markdown --include-selector "<selector>" | head -30
curl -s "<url>" | html2markdown --include-selector "<selector>" | head -30
Check line count
检查行数
curl -s "<url>" | html2markdown --include-selector "<selector>" | wc -l
undefinedcurl -s "<url>" | html2markdown --include-selector "<selector>" | wc -l
undefinedOptions Reference
选项参考
bash
--include-selector "CSS" # Only include matching elements
--exclude-selector "CSS" # Remove matching elements
--domain "https://..." # Convert relative links to absolutebash
--include-selector "CSS" # 仅包含匹配的元素
--exclude-selector "CSS" # 移除匹配的元素
--domain "https://..." # 将相对链接转换为绝对链接Comparison
对比
| Method | Anthropic Docs | Code Blocks | Complexity |
|---|---|---|---|
| Full page | 602 lines | Yes | Noisy |
| 385 lines | Yes | Clean |
| Bun script (universal) | 383 lines | Yes | Clean |
| 方法 | Anthropic 文档 | 代码块 | 复杂度 |
|---|---|---|---|
| 完整页面 | 602行 | 是 | 杂乱 |
| 385行 | 是 | 整洁 |
| Bun脚本(通用) | 383行 | 是 | 整洁 |
Troubleshooting
故障排除
Wrong content selected: The site may have multiple articles. Inspect the HTML:
bash
curl -s "<url>" | grep -o '<article[^>]*>'Empty output: The selector doesn't match. Try broader selectors like or .
mainbodyMissing code blocks: Check if the site uses non-standard code formatting.
Client-rendered content: If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.
选择了错误的内容:该站点可能有多个文章元素。检查HTML:
bash
curl -s "<url>" | grep -o '<article[^>]*>'空输出:选择器不匹配。尝试更宽泛的选择器,如或。
mainbody缺少代码块:检查站点是否使用了非标准的代码格式。
客户端渲染内容:如果HTML仅包含“Loading...”占位符,说明内容是JS渲染的。curl和Bun脚本都无法提取此类内容,请使用基于浏览器的工具。