extract

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Extract Skill

提取技能

Extract clean content from specific URLs. Ideal when you know which pages you want content from.

从特定URL中提取干净的内容。当你明确需要获取哪些页面的内容时，这是理想的选择。

Prerequisites

前提条件

Tavily API Key Required - Get your key at https://tavily.com

Add to

~/.claude/settings.json

json

{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}

需要Tavily API密钥 - 前往https://tavily.com获取你的密钥

将密钥添加到

~/.claude/settings.json

：

json

{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}

Quick Start

快速开始

Using the Script

使用脚本

bash

./scripts/extract.sh '<json>'

Examples:

bash

undefined

bash

./scripts/extract.sh '<json>'

示例：

bash

undefined

Single URL

单个URL

./scripts/extract.sh '{"urls": ["https://example.com/article"]}'

Multiple URLs

多个URL

./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'

With query focus and chunks

带查询焦点和分块

./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'

Advanced extraction for JS pages

针对JS页面的高级提取

./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'

undefined

./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'

undefined

Basic Extraction

基础提取

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com/article"]
  }'

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com/article"]
  }'

Multiple URLs with Query Focus

带查询焦点的多URL提取

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/ml-healthcare",
      "https://example.com/ai-diagnostics"
    ],
    "query": "AI diagnostic tools accuracy",
    "chunks_per_source": 3
  }'

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/ml-healthcare",
      "https://example.com/ai-diagnostics"
    ],
    "query": "AI diagnostic tools accuracy",
    "chunks_per_source": 3
  }'

API Reference

API参考

Endpoint

端点

POST https://api.tavily.com/extract

POST https://api.tavily.com/extract

Headers

请求头

Header	Value
`Authorization`	`Bearer <TAVILY_API_KEY>`
`Content-Type`	`application/json`

请求头	值
`Authorization`	`Bearer <TAVILY_API_KEY>`
`Content-Type`	`application/json`

Request Body

请求体

Field	Type	Default	Description
`urls`	array	Required	URLs to extract (max 20)
`query`	string	null	Reranks chunks by relevance
`chunks_per_source`	integer	3	Chunks per URL (1-5, requires query)
`extract_depth`	string	`"basic"`	`basic` or `advanced` (for JS pages)
`format`	string	`"markdown"`	`markdown` or `text`
`include_images`	boolean	false	Include image URLs
`timeout`	float	varies	Max wait (1-60 seconds)

字段	类型	默认值	描述
`urls`	数组	必填	要提取的URL（最多20个）
`query`	字符串	null	按相关性重新排序内容块
`chunks_per_source`	整数	3	每个URL的内容块数量（1-5，需要指定query）
`extract_depth`	字符串	`"basic"`	`basic` 或 `advanced` （针对JS页面）
`format`	字符串	`"markdown"`	`markdown` 或 `text`
`include_images`	布尔值	false	是否包含图片URL
`timeout`	浮点数	可变	最大等待时间（1-60秒）

Response Format

响应格式

json

{
  "results": [
    {
      "url": "https://example.com/article",
      "raw_content": "# Article Title\n\nContent..."
    }
  ],
  "failed_results": [],
  "response_time": 2.3
}

json

{
  "results": [
    {
      "url": "https://example.com/article",
      "raw_content": "# Article Title\n\nContent..."
    }
  ],
  "failed_results": [],
  "response_time": 2.3
}

Extract Depth

提取深度

Depth	When to Use
`basic`	Simple text extraction, faster
`advanced`	Dynamic/JS-rendered pages, tables, structured data

深度	使用场景
`basic`	简单文本提取，速度更快
`advanced`	动态/JS渲染页面、表格、结构化数据

Examples

示例

Single URL Extraction

单个URL提取

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://docs.python.org/3/tutorial/classes.html"],
    "extract_depth": "basic"
  }'

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://docs.python.org/3/tutorial/classes.html"],
    "extract_depth": "basic"
  }'

Targeted Extraction with Query

带查询的定向提取

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/react-hooks",
      "https://example.com/react-state"
    ],
    "query": "useState and useEffect patterns",
    "chunks_per_source": 2
  }'

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/react-hooks",
      "https://example.com/react-state"
    ],
    "query": "useState and useEffect patterns",
    "chunks_per_source": 2
  }'

JavaScript-Heavy Pages

重度依赖JavaScript的页面

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://app.example.com/dashboard"],
    "extract_depth": "advanced",
    "timeout": 60
  }'

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://app.example.com/dashboard"],
    "extract_depth": "advanced",
    "timeout": 60
  }'

Batch Extraction

批量提取

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3",
      "https://example.com/page4",
      "https://example.com/page5"
    ],
    "extract_depth": "basic"
  }'

bash

curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3",
      "https://example.com/page4",
      "https://example.com/page5"
    ],
    "extract_depth": "basic"
  }'

Tips

提示

Max 20 URLs per request - batch larger lists
Use
query
+
chunks_per_source
to get only relevant content
Try
basic
first, fall back to
```
advanced
```
if content is missing
Set longer
timeout
for slow pages (up to 60s)
Check
failed_results
for URLs that couldn't be extracted

每次请求最多20个URL - 更大的列表请分批处理
使用
query
+
chunks_per_source
仅获取相关内容
优先尝试
basic
模式，如果内容缺失再切换到
```
advanced
```
模式
为加载缓慢的页面设置更长的
timeout
（最多60秒）
检查
failed_results
查看无法提取的URL