extract
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseExtract Skill
提取技能
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
从特定URL中提取干净的内容。当你明确需要获取哪些页面的内容时,这是理想的选择。
Prerequisites
前提条件
Tavily API Key Required - Get your key at https://tavily.com
Add to :
~/.claude/settings.jsonjson
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}需要Tavily API密钥 - 前往https://tavily.com获取你的密钥
将密钥添加到:
~/.claude/settings.jsonjson
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}Quick Start
快速开始
Using the Script
使用脚本
bash
./scripts/extract.sh '<json>'Examples:
bash
undefinedbash
./scripts/extract.sh '<json>'示例:
bash
undefinedSingle URL
单个URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
Multiple URLs
多个URL
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
With query focus and chunks
带查询焦点和分块
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
Advanced extraction for JS pages
针对JS页面的高级提取
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
undefined./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
undefinedBasic Extraction
基础提取
bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'Multiple URLs with Query Focus
带查询焦点的多URL提取
bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'API Reference
API参考
Endpoint
端点
POST https://api.tavily.com/extractPOST https://api.tavily.com/extractHeaders
请求头
| Header | Value |
|---|---|
| |
| |
| 请求头 | 值 |
|---|---|
| |
| |
Request Body
请求体
| Field | Type | Default | Description |
|---|---|---|---|
| array | Required | URLs to extract (max 20) |
| string | null | Reranks chunks by relevance |
| integer | 3 | Chunks per URL (1-5, requires query) |
| string | | |
| string | | |
| boolean | false | Include image URLs |
| float | varies | Max wait (1-60 seconds) |
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 数组 | 必填 | 要提取的URL(最多20个) |
| 字符串 | null | 按相关性重新排序内容块 |
| 整数 | 3 | 每个URL的内容块数量(1-5,需要指定query) |
| 字符串 | | |
| 字符串 | | |
| 布尔值 | false | 是否包含图片URL |
| 浮点数 | 可变 | 最大等待时间(1-60秒) |
Response Format
响应格式
json
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}json
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}Extract Depth
提取深度
| Depth | When to Use |
|---|---|
| Simple text extraction, faster |
| Dynamic/JS-rendered pages, tables, structured data |
| 深度 | 使用场景 |
|---|---|
| 简单文本提取,速度更快 |
| 动态/JS渲染页面、表格、结构化数据 |
Examples
示例
Single URL Extraction
单个URL提取
bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'Targeted Extraction with Query
带查询的定向提取
bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'JavaScript-Heavy Pages
重度依赖JavaScript的页面
bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'Batch Extraction
批量提取
bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'Tips
提示
- Max 20 URLs per request - batch larger lists
- Use +
queryto get only relevant contentchunks_per_source - Try first, fall back to
basicif content is missingadvanced - Set longer for slow pages (up to 60s)
timeout - Check for URLs that couldn't be extracted
failed_results
- 每次请求最多20个URL - 更大的列表请分批处理
- 使用+
query仅获取相关内容chunks_per_source - 优先尝试模式,如果内容缺失再切换到
basic模式advanced - 为加载缓慢的页面设置更长的(最多60秒)
timeout - 检查查看无法提取的URL
failed_results