web-fetch
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWeb Fetch
网页内容获取
All web content retrieval uses (Bash) or the built-in tool. No MCP
server needed — Claude Code's native tools cover every Fetch MCP operation with more
control.
curlWebFetch所有网页内容获取操作均使用(Bash)或内置的工具。无需MCP服务器——Claude Code的原生工具可覆盖所有Fetch MCP操作,且提供更多控制权。
curlWebFetchQuick Reference
快速参考
| Fetch MCP Tool | Replacement | When to Use |
|---|---|---|
| | Raw HTML needed for parsing |
| | API responses, structured data |
| | Readable page content (default output is markdown) |
| | Plain text extraction |
Default choice: Use for general page content. Use when you need
headers, authentication, POST bodies, or raw format control.
WebFetchcurl| Fetch MCP工具 | 替代方案 | 适用场景 |
|---|---|---|
| | 需要原始HTML用于解析时 |
| | API响应、结构化数据场景 |
| | 可读页面内容(默认输出为markdown格式) |
| | 纯文本提取场景 |
默认选择: 通用页面内容获取使用。当需要自定义请求头、身份验证、POST请求体或原始格式控制时,使用。
WebFetchcurlWebFetch (Built-in Tool)
WebFetch(内置工具)
The tool fetches a URL and returns clean markdown content. It handles
JavaScript-rendered pages, strips navigation and boilerplate, and returns readable text.
WebFetchBest for: documentation pages, articles, blog posts, README files — any content where
you want readable text rather than raw HTML.
Limitations: no custom headers, no POST bodies, no cookie management. Use for
those.
curlWebFetch最适用于:文档页面、文章、博客帖子、README文件——任何需要可读文本而非原始HTML的内容场景。
局限性:不支持自定义请求头、POST请求体和Cookie管理。此类场景请使用。
curlcurl Patterns
curl 使用模式
Fetch HTML
获取HTML
bash
curl -sL "https://example.com/page"| Flag | Purpose |
|---|---|
| Silent mode — suppress progress meter |
| Follow redirects (3xx) |
| Save to file instead of stdout |
| Headers only (HEAD request) |
| Include response headers in output |
Fetch and extract specific elements with or :
xmllintpython3bash
curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys
class TitleParser(HTMLParser):
def __init__(self):
super().__init__()
self.in_title = False
self.title = ''
def handle_starttag(self, tag, attrs):
self.in_title = tag == 'title'
def handle_data(self, data):
if self.in_title:
self.title += data
def handle_endtag(self, tag):
if tag == 'title':
self.in_title = False
p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"bash
curl -sL "https://example.com/page"| 参数 | 作用 |
|---|---|
| 静默模式——抑制进度条输出 |
| 跟随重定向(3xx状态码) |
| 将结果保存到文件而非标准输出 |
| 仅获取响应头(HEAD请求) |
| 在输出中包含响应头 |
结合或提取特定元素:
xmllintpython3bash
curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys
class TitleParser(HTMLParser):
def __init__(self):
super().__init__()
self.in_title = False
self.title = ''
def handle_starttag(self, tag, attrs):
self.in_title = tag == 'title'
def handle_data(self, data):
if self.in_title:
self.title += data
def handle_endtag(self, tag):
if tag == 'title':
self.in_title = False
p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"Fetch JSON
获取JSON
bash
curl -s "https://api.example.com/v1/data" \
-H "Accept: application/json" | jq '.'Filter and reshape JSON responses:
bash
undefinedbash
curl -s "https://api.example.com/v1/data" \
-H "Accept: application/json" | jq '.'过滤和重构JSON响应:
bash
undefinedExtract specific fields
提取特定字段
curl -s "https://api.example.com/users" | jq '.[] | {name, email}'
curl -s "https://api.example.com/users" | jq '.[] | {name, email}'
Filter by condition
按条件过滤
curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'
curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'
Count results
统计结果数量
curl -s "https://api.example.com/items" | jq 'length'
curl -s "https://api.example.com/items" | jq 'length'
Get nested value
获取嵌套值
curl -s "https://api.example.com/config" | jq '.database.host'
undefinedcurl -s "https://api.example.com/config" | jq '.database.host'
undefinedFetch Plain Text
获取纯文本
bash
undefinedbash
undefinedStrip HTML tags for plain text
移除HTML标签提取纯文本
curl -sL "https://example.com/page" | python3 -c "
import html.parser, sys
class Stripper(html.parser.HTMLParser):
def init(self):
super().init()
self.text = []
def handle_data(self, d):
self.text.append(d)
def get_text(self):
return ''.join(self.text)
s = Stripper()
s.feed(sys.stdin.read())
print(s.get_text())
"
Or use `WebFetch` which returns clean markdown — close enough to plain text for most
purposes.
---curl -sL "https://example.com/page" | python3 -c "
import html.parser, sys
class Stripper(html.parser.HTMLParser):
def init(self):
super().init()
self.text = []
def handle_data(self, d):
self.text.append(d)
def get_text(self):
return ''.join(self.text)
s = Stripper()
s.feed(sys.stdin.read())
print(s.get_text())
"
也可使用`WebFetch`,它会返回整洁的markdown格式内容——对于大多数场景来说,已接近纯文本需求。
---Authenticated Requests
带身份验证的请求
Bearer Token
Bearer令牌
bash
curl -s "https://api.example.com/data" \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json"bash
curl -s "https://api.example.com/data" \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json"API Key in Header
请求头中的API密钥
bash
curl -s "https://api.example.com/data" \
-H "X-API-Key: $API_KEY"bash
curl -s "https://api.example.com/data" \
-H "X-API-Key: $API_KEY"API Key in Query Parameter
查询参数中的API密钥
bash
curl -s "https://api.example.com/data?api_key=$API_KEY"bash
curl -s "https://api.example.com/data?api_key=$API_KEY"Basic Auth
基础身份验证
bash
curl -s -u "username:$PASSWORD" "https://api.example.com/data"Store credentials in environment variables. Never hardcode tokens or passwords in
commands.
bash
curl -s -u "username:$PASSWORD" "https://api.example.com/data"将凭据存储在环境变量中。切勿在命令中硬编码令牌或密码。
POST, PUT, PATCH, DELETE
POST、PUT、PATCH、DELETE 请求
POST with JSON Body
带JSON请求体的POST
bash
curl -s -X POST "https://api.example.com/items" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_TOKEN" \
-d '{
"name": "item-name",
"value": 42
}' | jq '.'bash
curl -s -X POST "https://api.example.com/items" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_TOKEN" \
-d '{
"name": "item-name",
"value": 42
}' | jq '.'POST with Form Data
带表单数据的POST
bash
curl -s -X POST "https://api.example.com/upload" \
-F "file=@./document.pdf" \
-F "description=Uploaded via curl"bash
curl -s -X POST "https://api.example.com/upload" \
-F "file=@./document.pdf" \
-F "description=Uploaded via curl"PUT (Full Update)
PUT(全量更新)
bash
curl -s -X PUT "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"name": "updated-name", "value": 99}' | jq '.'bash
curl -s -X PUT "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"name": "updated-name", "value": 99}' | jq '.'PATCH (Partial Update)
PATCH(增量更新)
bash
curl -s -X PATCH "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"value": 100}' | jq '.'bash
curl -s -X PATCH "https://api.example.com/items/123" \
-H "Content-Type: application/json" \
-d '{"value": 100}' | jq '.'DELETE
DELETE
bash
curl -s -X DELETE "https://api.example.com/items/123" \
-H "Authorization: Bearer $API_TOKEN"bash
curl -s -X DELETE "https://api.example.com/items/123" \
-H "Authorization: Bearer $API_TOKEN"Advanced Patterns
高级使用模式
Pagination
分页处理
bash
PAGE=1
while true; do
RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
-H "Authorization: Bearer $API_TOKEN")
COUNT=$(echo "$RESPONSE" | jq 'length')
echo "$RESPONSE" | jq '.[]'
[ "$COUNT" -lt 50 ] && break
PAGE=$((PAGE + 1))
donebash
PAGE=1
while true; do
RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
-H "Authorization: Bearer $API_TOKEN")
COUNT=$(echo "$RESPONSE" | jq 'length')
echo "$RESPONSE" | jq '.[]'
[ "$COUNT" -lt 50 ] && break
PAGE=$((PAGE + 1))
doneTimeout and Retry
超时与重试
bash
curl -s --connect-timeout 10 --max-time 30 \
--retry 3 --retry-delay 2 \
"https://api.example.com/data"bash
curl -s --connect-timeout 10 --max-time 30 \
--retry 3 --retry-delay 2 \
"https://api.example.com/data"Response Headers Inspection
响应头检查
bash
curl -sI "https://example.com" | grep -i "content-type"bash
curl -sI "https://example.com" | grep -i "content-type"Save Response with Status Code
保存响应并记录状态码
bash
HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'bash
HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'Cookie Handling
Cookie处理
bash
undefinedbash
undefinedSave cookies
保存Cookie
curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"
-d "user=admin&pass=$PASSWORD"
curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"
-d "user=admin&pass=$PASSWORD"
Reuse cookies
复用Cookie
curl -s -b /tmp/cookies.txt "https://example.com/dashboard"
---curl -s -b /tmp/cookies.txt "https://example.com/dashboard"
---Error Handling
错误处理
| HTTP Status | Meaning | Resolution |
|---|---|---|
| 301/302 | Redirect | Add |
| 401 | Unauthorized | Check token/credentials; verify env var is set |
| 403 | Forbidden | Insufficient permissions or IP restriction |
| 404 | Not Found | Verify URL path; resource may be deleted |
| 429 | Rate Limited | Respect |
| 500 | Server Error | Retry once; if persistent, report upstream |
| SSL error | Certificate issue | Do not use |
| Timeout | Network/server slow | Increase |
Verify a URL is reachable before complex operations:
bash
curl -sI -o /dev/null -w "%{http_code}" "https://example.com"| HTTP状态码 | 含义 | 解决方法 |
|---|---|---|
| 301/302 | 重定向 | 添加 |
| 401 | 未授权 | 检查令牌/凭据;确认环境变量已正确设置 |
| 403 | 禁止访问 | 权限不足或IP受限 |
| 404 | 未找到 | 验证URL路径;资源可能已被删除 |
| 429 | 请求频率超限 | 遵循 |
| 500 | 服务器错误 | 重试一次;若持续报错,向上游反馈 |
| SSL错误 | 证书问题 | 请勿使用 |
| 超时 | 网络/服务器缓慢 | 增大 |
在执行复杂操作前,先验证URL是否可达:
bash
curl -sI -o /dev/null -w "%{http_code}" "https://example.com"Limitations
局限性
- WebFetch does not support custom headers, POST bodies, or cookies. Use for authenticated or stateful requests.
curl - curl does not render JavaScript. For JS-heavy SPAs, prefer which handles rendered content.
WebFetch - Large responses may exceed context limits. Pipe through ,
jq, orheadto extract only needed data before loading into context.python3 - Binary content (images, PDFs, archives) should be saved to disk with , not piped to stdout.
-o
- WebFetch 不支持自定义请求头、POST请求体或Cookie管理。需身份验证或有状态的请求请使用。
curl - curl 不支持JavaScript渲染。对于JS密集型单页应用(SPA),优先使用,它可处理渲染后的内容。
WebFetch - 大响应内容 可能超出上下文限制。通过、
jq或head管道提取仅需的数据后,再加载到上下文。python3 - 二进制内容(图片、PDF、压缩包)应使用参数保存到磁盘,而非输出到标准输出。
-o
Calibration Rules
校准规则
- Default to WebFetch for reading web pages. It returns clean markdown, handles JS rendering, and requires no flags. Switch to curl only when you need headers, auth, POST, or raw format control.
- Always pipe JSON through jq. Raw JSON in context wastes tokens. Filter to only the fields needed.
- Never hardcode credentials. Use references. If the variable is not set, surface the error immediately.
$ENV_VAR - Follow redirects by default. Always use with curl unless you specifically need to inspect the redirect chain.
-L - Prefer (silent) on every curl call. Progress meters add noise to output.
-s
- 默认使用WebFetch读取网页。 它返回整洁的markdown格式内容,支持JS渲染,无需额外参数。仅当需要请求头、身份验证、POST请求或原始格式控制时,切换到curl。
- JSON响应务必通过jq处理。 上下文里的原始JSON会浪费令牌。仅过滤出所需字段。
- 切勿硬编码凭据。 使用引用环境变量。若变量未设置,立即提示错误。
$ENV_VAR - 默认跟随重定向。 curl命令中始终使用参数,除非你特意需要检查重定向链。
-L - 所有curl调用优先使用(静默)参数。 进度条会给输出带来冗余信息。
-s