extract

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Extract Skill

提取技能

Extract clean content from specific URLs. Ideal when you know which pages you want content from.
从特定URL中提取干净的内容。当你明确需要获取哪些页面的内容时,这是理想的选择。

Prerequisites

前提条件

Tavily API Key Required - Get your key at https://tavily.com
Add to
~/.claude/settings.json
:
json
{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}
需要Tavily API密钥 - 前往https://tavily.com获取你的密钥
将密钥添加到
~/.claude/settings.json
json
{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}

Quick Start

快速开始

Using the Script

使用脚本

bash
./scripts/extract.sh '<json>'
Examples:
bash
undefined
bash
./scripts/extract.sh '<json>'
示例:
bash
undefined

Single URL

单个URL

./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'

Multiple URLs

多个URL

./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'

With query focus and chunks

带查询焦点和分块

./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'

Advanced extraction for JS pages

针对JS页面的高级提取

./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
undefined
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
undefined

Basic Extraction

基础提取

bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com/article"]
  }'
bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com/article"]
  }'

Multiple URLs with Query Focus

带查询焦点的多URL提取

bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/ml-healthcare",
      "https://example.com/ai-diagnostics"
    ],
    "query": "AI diagnostic tools accuracy",
    "chunks_per_source": 3
  }'
bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/ml-healthcare",
      "https://example.com/ai-diagnostics"
    ],
    "query": "AI diagnostic tools accuracy",
    "chunks_per_source": 3
  }'

API Reference

API参考

Endpoint

端点

POST https://api.tavily.com/extract
POST https://api.tavily.com/extract

Headers

请求头

HeaderValue
Authorization
Bearer <TAVILY_API_KEY>
Content-Type
application/json
请求头
Authorization
Bearer <TAVILY_API_KEY>
Content-Type
application/json

Request Body

请求体

FieldTypeDefaultDescription
urls
arrayRequiredURLs to extract (max 20)
query
stringnullReranks chunks by relevance
chunks_per_source
integer3Chunks per URL (1-5, requires query)
extract_depth
string
"basic"
basic
or
advanced
(for JS pages)
format
string
"markdown"
markdown
or
text
include_images
booleanfalseInclude image URLs
timeout
floatvariesMax wait (1-60 seconds)
字段类型默认值描述
urls
数组必填要提取的URL(最多20个)
query
字符串null按相关性重新排序内容块
chunks_per_source
整数3每个URL的内容块数量(1-5,需要指定query)
extract_depth
字符串
"basic"
basic
advanced
(针对JS页面)
format
字符串
"markdown"
markdown
text
include_images
布尔值false是否包含图片URL
timeout
浮点数可变最大等待时间(1-60秒)

Response Format

响应格式

json
{
  "results": [
    {
      "url": "https://example.com/article",
      "raw_content": "# Article Title\n\nContent..."
    }
  ],
  "failed_results": [],
  "response_time": 2.3
}
json
{
  "results": [
    {
      "url": "https://example.com/article",
      "raw_content": "# Article Title\n\nContent..."
    }
  ],
  "failed_results": [],
  "response_time": 2.3
}

Extract Depth

提取深度

DepthWhen to Use
basic
Simple text extraction, faster
advanced
Dynamic/JS-rendered pages, tables, structured data
深度使用场景
basic
简单文本提取,速度更快
advanced
动态/JS渲染页面、表格、结构化数据

Examples

示例

Single URL Extraction

单个URL提取

bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://docs.python.org/3/tutorial/classes.html"],
    "extract_depth": "basic"
  }'
bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://docs.python.org/3/tutorial/classes.html"],
    "extract_depth": "basic"
  }'

Targeted Extraction with Query

带查询的定向提取

bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/react-hooks",
      "https://example.com/react-state"
    ],
    "query": "useState and useEffect patterns",
    "chunks_per_source": 2
  }'
bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/react-hooks",
      "https://example.com/react-state"
    ],
    "query": "useState and useEffect patterns",
    "chunks_per_source": 2
  }'

JavaScript-Heavy Pages

重度依赖JavaScript的页面

bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://app.example.com/dashboard"],
    "extract_depth": "advanced",
    "timeout": 60
  }'
bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://app.example.com/dashboard"],
    "extract_depth": "advanced",
    "timeout": 60
  }'

Batch Extraction

批量提取

bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3",
      "https://example.com/page4",
      "https://example.com/page5"
    ],
    "extract_depth": "basic"
  }'
bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3",
      "https://example.com/page4",
      "https://example.com/page5"
    ],
    "extract_depth": "basic"
  }'

Tips

提示

  • Max 20 URLs per request - batch larger lists
  • Use
    query
    +
    chunks_per_source
    to get only relevant content
  • Try
    basic
    first
    , fall back to
    advanced
    if content is missing
  • Set longer
    timeout
    for slow pages (up to 60s)
  • Check
    failed_results
    for URLs that couldn't be extracted
  • 每次请求最多20个URL - 更大的列表请分批处理
  • 使用
    query
    +
    chunks_per_source
    仅获取相关内容
  • 优先尝试
    basic
    模式
    ,如果内容缺失再切换到
    advanced
    模式
  • 为加载缓慢的页面设置更长的
    timeout
    (最多60秒)
  • 检查
    failed_results
    查看无法提取的URL