web-fetch

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Web Fetch Skill

Web Fetch 技能

Fetch and parse web content from URLs.

从URL获取并解析网页内容。

When to Use

适用场景

✅ USE this skill when:

"Fetch content from URL"
"Download file from..."
"Extract article text from..."
"Get page title and description"
"Scrape data from webpage"

✅ 在以下场景使用本技能：

"从URL获取内容"
"从...下载文件"
"从...提取文章文本"
"获取页面标题和描述"
"从网页抓取数据"

When NOT to Use

不适用场景

❌ DON'T use this skill when:

Interactive browser actions → use browser-tools
Authenticated sessions → use browser-tools with profile
JavaScript-heavy SPAs → use browser-tools

❌ 请勿在以下场景使用本技能：

需要交互式浏览器操作 → 使用browser-tools
需要已认证会话 → 使用带配置文件的browser-tools
重度依赖JavaScript的SPA页面 → 使用browser-tools

Commands

命令

Fetch Content

获取内容

bash

{baseDir}/fetch.sh "https://example.com"
{baseDir}/fetch.sh "https://example.com" --markdown
{baseDir}/fetch.sh "https://example.com" --json

bash

{baseDir}/fetch.sh "https://example.com"
{baseDir}/fetch.sh "https://example.com" --markdown
{baseDir}/fetch.sh "https://example.com" --json

Extract Article

提取文章

bash

{baseDir}/extract.sh "https://example.com/article"
{baseDir}/extract.sh "https://example.com/article" --format markdown

bash

{baseDir}/extract.sh "https://example.com/article"
{baseDir}/extract.sh "https://example.com/article" --format markdown

Download File

下载文件

bash

{baseDir}/download.sh "https://example.com/file.pdf" --out /tmp/file.pdf
{baseDir}/download.sh "https://example.com/archive.zip" --out /tmp/archive.zip

bash

{baseDir}/download.sh "https://example.com/file.pdf" --out /tmp/file.pdf
{baseDir}/download.sh "https://example.com/archive.zip" --out /tmp/archive.zip

Get Page Metadata

获取页面元数据

bash

{baseDir}/metadata.sh "https://example.com"
{baseDir}/metadata.sh "https://example.com" --json

bash

{baseDir}/metadata.sh "https://example.com"
{baseDir}/metadata.sh "https://example.com" --json

Extract Links

提取链接

bash

{baseDir}/links.sh "https://example.com"
{baseDir}/links.sh "https://example.com" --filter "blog"

bash

{baseDir}/links.sh "https://example.com"
{baseDir}/links.sh "https://example.com" --filter "blog"

Extract Images

提取图片

bash

{baseDir}/images.sh "https://example.com"
{baseDir}/images.sh "https://example.com" --download --out /tmp/images/

bash

{baseDir}/images.sh "https://example.com"
{baseDir}/images.sh "https://example.com" --download --out /tmp/images/

Options

选项

```
--markdown
```
: Output as markdown
```
--json
```
: Output as JSON
```
--text
```
: Plain text output
```
--timeout N
```
: Timeout in seconds (default: 30)
```
--user-agent
```
: Custom user agent
```
--out <path>
```
: Output file path

```
--markdown
```
: 以markdown格式输出
```
--json
```
: 以JSON格式输出
```
--text
```
: 纯文本输出
```
--timeout N
```
: 超时时间（秒，默认：30）
```
--user-agent
```
: 自定义用户代理
```
--out <path>
```
: 输出文件路径

Output Formats

输出格式

Plain Text

纯文本

Extract visible text from HTML, cleaned of scripts and styles.

从HTML中提取可见文本，移除脚本和样式。

Markdown

Convert HTML to markdown with proper formatting.

将HTML转换为格式规范的markdown。

JSON

Structured output with title, content, metadata.

包含标题、内容、元数据的结构化输出。

Examples

示例

Get article content:

bash

{baseDir}/extract.sh "https://example.com/blog/post" --markdown

Download all PDFs from page:

bash

{baseDir}/links.sh "https://example.com" --filter ".pdf" | xargs -I {} download.sh "{}"

Get page metadata:

bash

{baseDir}/metadata.sh "https://example.com" --json

获取文章内容：

bash

{baseDir}/extract.sh "https://example.com/blog/post" --markdown

下载页面中所有PDF：

bash

{baseDir}/links.sh "https://example.com" --filter ".pdf" | xargs -I {} download.sh "{}"

获取页面元数据：

bash

{baseDir}/metadata.sh "https://example.com" --json

Output: {"title": "...", "description": "...", "og:image": "..."}

输出: {"title": "...", "description": "...", "og:image": "..."}

undefined

undefined

Notes

注意事项

Respects robots.txt by default
Rate limiting: 1 request per second by default
Use
```
--user-agent
```
to set custom user agent
For JavaScript-heavy pages, use browser-tools instead

默认遵守robots.txt规则
默认速率限制：每秒1次请求
使用
```
--user-agent
```
设置自定义用户代理
对于重度依赖JavaScript的页面，请使用browser-tools