web-fetch

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Web Fetch

网页内容获取

All web content retrieval uses

curl

(Bash) or the built-in

WebFetch

tool. No MCP server needed — Claude Code's native tools cover every Fetch MCP operation with more control.

所有网页内容获取操作均使用

curl

（Bash）或内置的

WebFetch

工具。无需MCP服务器——Claude Code的原生工具可覆盖所有Fetch MCP操作，且提供更多控制权。

Quick Reference

快速参考

Fetch MCP Tool	Replacement	When to Use
`fetch_html`	`curl -s URL`	Raw HTML needed for parsing
`fetch_json`	`curl -s URL \| jq '.'`	API responses, structured data
`fetch_markdown`	`WebFetch`	Readable page content (default output is markdown)
`fetch_txt`	`curl -s URL` or `WebFetch`	Plain text extraction

Default choice: Use

WebFetch

for general page content. Use

curl

when you need headers, authentication, POST bodies, or raw format control.

Fetch MCP工具	替代方案	适用场景
`fetch_html`	`curl -s URL`	需要原始HTML用于解析时
`fetch_json`	`curl -s URL \| jq '.'`	API响应、结构化数据场景
`fetch_markdown`	`WebFetch`	可读页面内容（默认输出为markdown格式）
`fetch_txt`	`curl -s URL` 或 `WebFetch`	纯文本提取场景

默认选择： 通用页面内容获取使用

WebFetch

。当需要自定义请求头、身份验证、POST请求体或原始格式控制时，使用

curl

。

WebFetch (Built-in Tool)

WebFetch（内置工具）

The

WebFetch

tool fetches a URL and returns clean markdown content. It handles JavaScript-rendered pages, strips navigation and boilerplate, and returns readable text.

Best for: documentation pages, articles, blog posts, README files — any content where you want readable text rather than raw HTML.

Limitations: no custom headers, no POST bodies, no cookie management. Use

curl

for those.

WebFetch

工具可获取指定URL并返回整洁的markdown内容。它支持处理JavaScript渲染的页面，移除导航栏和冗余内容，返回可读文本。

最适用于：文档页面、文章、博客帖子、README文件——任何需要可读文本而非原始HTML的内容场景。

局限性：不支持自定义请求头、POST请求体和Cookie管理。此类场景请使用

curl

。

curl Patterns

curl 使用模式

Fetch HTML

获取HTML

bash

curl -sL "https://example.com/page"

Flag	Purpose
`-s`	Silent mode — suppress progress meter
`-L`	Follow redirects (3xx)
`-o file.html`	Save to file instead of stdout
`-I`	Headers only (HEAD request)
`-i`	Include response headers in output

Fetch and extract specific elements with

xmllint

python3

bash

curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys

class TitleParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.in_title = False
        self.title = ''
    def handle_starttag(self, tag, attrs):
        self.in_title = tag == 'title'
    def handle_data(self, data):
        if self.in_title:
            self.title += data
    def handle_endtag(self, tag):
        if tag == 'title':
            self.in_title = False

p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"

bash

curl -sL "https://example.com/page"

参数	作用
`-s`	静默模式——抑制进度条输出
`-L`	跟随重定向（3xx状态码）
`-o file.html`	将结果保存到文件而非标准输出
`-I`	仅获取响应头（HEAD请求）
`-i`	在输出中包含响应头

结合

xmllint

或

python3

提取特定元素：

bash

curl -sL "https://example.com" | python3 -c "
from html.parser import HTMLParser
import sys

class TitleParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.in_title = False
        self.title = ''
    def handle_starttag(self, tag, attrs):
        self.in_title = tag == 'title'
    def handle_data(self, data):
        if self.in_title:
            self.title += data
    def handle_endtag(self, tag):
        if tag == 'title':
            self.in_title = False

p = TitleParser()
p.feed(sys.stdin.read())
print(p.title)
"

Fetch JSON

获取JSON

bash

curl -s "https://api.example.com/v1/data" \
  -H "Accept: application/json" | jq '.'

Filter and reshape JSON responses:

bash

undefined

bash

curl -s "https://api.example.com/v1/data" \
  -H "Accept: application/json" | jq '.'

过滤和重构JSON响应：

bash

undefined

Extract specific fields

提取特定字段

curl -s "https://api.example.com/users" | jq '.[] | {name, email}'

Filter by condition

按条件过滤

curl -s "https://api.example.com/items" | jq '[.[] | select(.status == "active")]'

Count results

统计结果数量

curl -s "https://api.example.com/items" | jq 'length'

Get nested value

获取嵌套值

curl -s "https://api.example.com/config" | jq '.database.host'

undefined

curl -s "https://api.example.com/config" | jq '.database.host'

undefined

Fetch Plain Text

获取纯文本

bash

undefined

bash

undefined

Strip HTML tags for plain text

移除HTML标签提取纯文本

curl -sL "https://example.com/page" | python3 -c " import html.parser, sys

class Stripper(html.parser.HTMLParser): def init(self): super().init() self.text = [] def handle_data(self, d): self.text.append(d) def get_text(self): return ''.join(self.text)

s = Stripper() s.feed(sys.stdin.read()) print(s.get_text()) "


Or use `WebFetch` which returns clean markdown — close enough to plain text for most
purposes.

---

curl -sL "https://example.com/page" | python3 -c " import html.parser, sys

class Stripper(html.parser.HTMLParser): def init(self): super().init() self.text = [] def handle_data(self, d): self.text.append(d) def get_text(self): return ''.join(self.text)

s = Stripper() s.feed(sys.stdin.read()) print(s.get_text()) "


也可使用`WebFetch`，它会返回整洁的markdown格式内容——对于大多数场景来说，已接近纯文本需求。

---

Authenticated Requests

带身份验证的请求

Bearer Token

Bearer令牌

bash

curl -s "https://api.example.com/data" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json"

bash

curl -s "https://api.example.com/data" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json"

API Key in Header

请求头中的API密钥

bash

curl -s "https://api.example.com/data" \
  -H "X-API-Key: $API_KEY"

bash

curl -s "https://api.example.com/data" \
  -H "X-API-Key: $API_KEY"

API Key in Query Parameter

查询参数中的API密钥

bash

curl -s "https://api.example.com/data?api_key=$API_KEY"

bash

curl -s "https://api.example.com/data?api_key=$API_KEY"

Basic Auth

基础身份验证

bash

curl -s -u "username:$PASSWORD" "https://api.example.com/data"

Store credentials in environment variables. Never hardcode tokens or passwords in commands.

bash

curl -s -u "username:$PASSWORD" "https://api.example.com/data"

将凭据存储在环境变量中。切勿在命令中硬编码令牌或密码。

POST, PUT, PATCH, DELETE

POST、PUT、PATCH、DELETE 请求

POST with JSON Body

带JSON请求体的POST

bash

curl -s -X POST "https://api.example.com/items" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -d '{
    "name": "item-name",
    "value": 42
  }' | jq '.'

bash

curl -s -X POST "https://api.example.com/items" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -d '{
    "name": "item-name",
    "value": 42
  }' | jq '.'

POST with Form Data

带表单数据的POST

bash

curl -s -X POST "https://api.example.com/upload" \
  -F "file=@./document.pdf" \
  -F "description=Uploaded via curl"

bash

curl -s -X POST "https://api.example.com/upload" \
  -F "file=@./document.pdf" \
  -F "description=Uploaded via curl"

PUT (Full Update)

PUT（全量更新）

bash

curl -s -X PUT "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name", "value": 99}' | jq '.'

bash

curl -s -X PUT "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"name": "updated-name", "value": 99}' | jq '.'

PATCH (Partial Update)

PATCH（增量更新）

bash

curl -s -X PATCH "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"value": 100}' | jq '.'

bash

curl -s -X PATCH "https://api.example.com/items/123" \
  -H "Content-Type: application/json" \
  -d '{"value": 100}' | jq '.'

DELETE

bash

curl -s -X DELETE "https://api.example.com/items/123" \
  -H "Authorization: Bearer $API_TOKEN"

bash

curl -s -X DELETE "https://api.example.com/items/123" \
  -H "Authorization: Bearer $API_TOKEN"

Advanced Patterns

高级使用模式

Pagination

分页处理

bash

PAGE=1
while true; do
  RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
    -H "Authorization: Bearer $API_TOKEN")
  COUNT=$(echo "$RESPONSE" | jq 'length')
  echo "$RESPONSE" | jq '.[]'
  [ "$COUNT" -lt 50 ] && break
  PAGE=$((PAGE + 1))
done

bash

PAGE=1
while true; do
  RESPONSE=$(curl -s "https://api.example.com/items?page=$PAGE&per_page=50" \
    -H "Authorization: Bearer $API_TOKEN")
  COUNT=$(echo "$RESPONSE" | jq 'length')
  echo "$RESPONSE" | jq '.[]'
  [ "$COUNT" -lt 50 ] && break
  PAGE=$((PAGE + 1))
done

Timeout and Retry

超时与重试

bash

curl -s --connect-timeout 10 --max-time 30 \
  --retry 3 --retry-delay 2 \
  "https://api.example.com/data"

bash

curl -s --connect-timeout 10 --max-time 30 \
  --retry 3 --retry-delay 2 \
  "https://api.example.com/data"

Response Headers Inspection

响应头检查

bash

curl -sI "https://example.com" | grep -i "content-type"

bash

curl -sI "https://example.com" | grep -i "content-type"

Save Response with Status Code

保存响应并记录状态码

bash

HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'

bash

HTTP_CODE=$(curl -s -o /tmp/response.json -w "%{http_code}" "https://api.example.com/data")
echo "Status: $HTTP_CODE"
cat /tmp/response.json | jq '.'

Cookie Handling

Cookie处理

bash

undefined

bash

undefined

Save cookies

保存Cookie

curl -s -c /tmp/cookies.txt "https://example.com/login"
-d "user=admin&pass=$PASSWORD"

Reuse cookies

复用Cookie

curl -s -b /tmp/cookies.txt "https://example.com/dashboard"

---

curl -s -b /tmp/cookies.txt "https://example.com/dashboard"

---

Error Handling

错误处理

HTTP Status	Meaning	Resolution
301/302	Redirect	Add `-L` flag to follow
401	Unauthorized	Check token/credentials; verify env var is set
403	Forbidden	Insufficient permissions or IP restriction
404	Not Found	Verify URL path; resource may be deleted
429	Rate Limited	Respect `Retry-After` header; add delay between requests
500	Server Error	Retry once; if persistent, report upstream
SSL error	Certificate issue	Do not use `-k` (insecure) — fix the root cause
Timeout	Network/server slow	Increase `--max-time` ; check connectivity

Verify a URL is reachable before complex operations:

bash

curl -sI -o /dev/null -w "%{http_code}" "https://example.com"

HTTP状态码	含义	解决方法
301/302	重定向	添加 `-L` 参数跟随重定向
401	未授权	检查令牌/凭据；确认环境变量已正确设置
403	禁止访问	权限不足或IP受限
404	未找到	验证URL路径；资源可能已被删除
429	请求频率超限	遵循 `Retry-After` 响应头；请求间增加延迟
500	服务器错误	重试一次；若持续报错，向上游反馈
SSL错误	证书问题	请勿使用 `-k` （不安全）——修复根本原因
超时	网络/服务器缓慢	增大 `--max-time` 参数；检查网络连接

在执行复杂操作前，先验证URL是否可达：

bash

curl -sI -o /dev/null -w "%{http_code}" "https://example.com"

Limitations

局限性

WebFetch does not support custom headers, POST bodies, or cookies. Use
```
curl
```
for authenticated or stateful requests.
curl does not render JavaScript. For JS-heavy SPAs, prefer
```
WebFetch
```
which handles rendered content.
Large responses may exceed context limits. Pipe through
```
jq
```
,
```
head
```
, or
```
python3
```
to extract only needed data before loading into context.
Binary content (images, PDFs, archives) should be saved to disk with
```
-o
```
, not piped to stdout.

WebFetch 不支持自定义请求头、POST请求体或Cookie管理。需身份验证或有状态的请求请使用
```
curl
```
。
curl 不支持JavaScript渲染。对于JS密集型单页应用（SPA），优先使用
```
WebFetch
```
，它可处理渲染后的内容。
大响应内容 可能超出上下文限制。通过
```
jq
```
、
```
head
```
或
```
python3
```
管道提取仅需的数据后，再加载到上下文。
二进制内容（图片、PDF、压缩包）应使用
```
-o
```
参数保存到磁盘，而非输出到标准输出。

Calibration Rules

校准规则

Default to WebFetch for reading web pages. It returns clean markdown, handles JS rendering, and requires no flags. Switch to curl only when you need headers, auth, POST, or raw format control.
Always pipe JSON through jq. Raw JSON in context wastes tokens. Filter to only the fields needed.
Never hardcode credentials. Use
```
$ENV_VAR
```
references. If the variable is not set, surface the error immediately.
Follow redirects by default. Always use
```
-L
```
with curl unless you specifically need to inspect the redirect chain.
Prefer
-s
(silent) on every curl call. Progress meters add noise to output.

默认使用WebFetch读取网页。 它返回整洁的markdown格式内容，支持JS渲染，无需额外参数。仅当需要请求头、身份验证、POST请求或原始格式控制时，切换到curl。
JSON响应务必通过jq处理。 上下文里的原始JSON会浪费令牌。仅过滤出所需字段。
切勿硬编码凭据。 使用
```
$ENV_VAR
```
引用环境变量。若变量未设置，立即提示错误。
默认跟随重定向。 curl命令中始终使用
```
-L
```
参数，除非你特意需要检查重定向链。
所有curl调用优先使用
-s
（静默）参数。进度条会给输出带来冗余信息。