google-surf-mcp-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

google-surf-mcp-search

google-surf-mcp-search

Skill by ara.so — MCP Skills collection.
ara.so 开发的Skill —— 属于MCP Skills集合。

What It Does

功能介绍

google-surf-mcp is an MCP server that provides Google search functionality without requiring API keys. It combines three capabilities in one:
  • Google search with ad/spam filtering
  • URL content extraction (HTML + PDF)
  • Academic paper extraction (arXiv, Nature, PubMed, etc.)
Key features:
  • Works with actual Google search (not an API wrapper)
  • Automatic CAPTCHA recovery with persistent browser profiles
  • Parallel search and extraction
  • Token-efficient abstract mode for triage
  • Built-in rate limiting and caching
  • Geometric verification to drop sponsored ads and knowledge panels
google-surf-mcp是一款无需API密钥即可提供谷歌搜索功能的MCP服务器,集成了三大核心能力:
  • 带广告/垃圾信息过滤的谷歌搜索
  • URL内容提取(HTML + PDF格式)
  • 学术论文提取(支持arXiv、Nature、PubMed等平台)
核心特性:
  • 基于真实谷歌搜索实现(非API封装)
  • 借助持久化浏览器配置文件自动完成CAPTCHA恢复
  • 支持并行搜索与内容提取
  • 适用于快速筛选的高效摘要模式
  • 内置速率限制与缓存机制
  • 通过几何验证过滤赞助广告和知识面板

Installation

安装方法

Quick Install (npx)

快速安装(npx方式)

Add to your MCP client config (e.g.,
~/.claude.json
for Claude Code):
json
{
  "mcpServers": {
    "google-surf": {
      "command": "npx",
      "args": ["-y", "google-surf-mcp"]
    }
  }
}
将以下配置添加到你的MCP客户端配置文件中(例如Claude Code的
~/.claude.json
):
json
{
  "mcpServers": {
    "google-surf": {
      "command": "npx",
      "args": ["-y", "google-surf-mcp"]
    }
  }
}

Local Clone Installation

本地克隆安装

bash
git clone https://github.com/HarimxChoi/google-surf-mcp
cd google-surf-mcp
npm install
npm run build
Config for local installation:
json
{
  "mcpServers": {
    "google-surf": {
      "command": "node",
      "args": ["/absolute/path/to/google-surf-mcp/build/index.js"]
    }
  }
}
bash
git clone https://github.com/HarimxChoi/google-surf-mcp
cd google-surf-mcp
npm install
npm run build
本地安装对应的配置:
json
{
  "mcpServers": {
    "google-surf": {
      "command": "node",
      "args": ["/absolute/path/to/google-surf-mcp/build/index.js"]
    }
  }
}

Manual Bootstrap (if auto-bootstrap fails)

手动引导(自动引导失败时使用)

bash
npm run bootstrap
With custom paths:
bash
CHROME_PATH=/usr/bin/google-chrome SURF_TZ=America/New_York npm run bootstrap
bash
npm run bootstrap
自定义路径引导:
bash
CHROME_PATH=/usr/bin/google-chrome SURF_TZ=America/New_York npm run bootstrap

Available Tools

可用工具

1.
search
- Single Google Search

1.
search
- 单次谷歌搜索

Performs a single Google search, returns filtered results (ads removed).
Parameters:
  • query
    (string, required): Search query
  • limit
    (number, optional): Max results, default 10
Returns:
  • results[]
    : Array of
    { title, url, snippet }
  • dropped
    : Count of filtered results (ads, knowledge panels)
  • dropped_reasons[]
    : Why items were dropped
  • cache_hit
    : Boolean indicating cache use
Example Usage:
typescript
// Via MCP tool call
{
  "query": "typescript async patterns",
  "limit": 5
}
Response:
json
{
  "results": [
    {
      "title": "Async/Await in TypeScript",
      "url": "https://example.com/typescript-async",
      "snippet": "Learn how to use async/await patterns..."
    }
  ],
  "dropped": 2,
  "dropped_reasons": ["sponsored", "knowledge_panel"],
  "cache_hit": false
}
执行单次谷歌搜索,返回过滤后的结果(已移除广告)。
参数:
  • query
    (字符串,必填):搜索关键词
  • limit
    (数字,可选):最大结果数,默认值为10
返回值:
  • results[]
    :包含
    { title, url, snippet }
    的结果数组
  • dropped
    :被过滤的结果数量(广告、知识面板)
  • dropped_reasons[]
    :结果被过滤的原因
  • cache_hit
    :是否命中缓存的布尔值
使用示例:
typescript
// 通过MCP工具调用
{
  "query": "typescript async patterns",
  "limit": 5
}
响应示例:
json
{
  "results": [
    {
      "title": "Async/Await in TypeScript",
      "url": "https://example.com/typescript-async",
      "snippet": "Learn how to use async/await patterns..."
    }
  ],
  "dropped": 2,
  "dropped_reasons": ["sponsored", "knowledge_panel"],
  "cache_hit": false
}

2.
search_parallel
- Parallel Multi-Query Search

2.
search_parallel
- 多关键词并行搜索

Execute multiple searches in parallel using a worker pool (max 10 queries).
Parameters:
  • queries
    (string[], required): Array of search queries
  • limit
    (number, optional): Max results per query, default 10
Returns:
  • Array of search results (same format as
    search
    )
Example Usage:
typescript
{
  "queries": [
    "mcp server best practices",
    "playwright stealth techniques",
    "typescript pdf extraction",
    "google search scraping 2026"
  ],
  "limit": 3
}
通过工作池并行执行多个搜索(最多支持10个关键词)。
参数:
  • queries
    (字符串数组,必填):搜索关键词数组
  • limit
    (数字,可选):每个关键词的最大结果数,默认值为10
返回值:
  • 搜索结果数组(格式与
    search
    工具一致)
使用示例:
typescript
{
  "queries": [
    "mcp server best practices",
    "playwright stealth techniques",
    "typescript pdf extraction",
    "google search scraping 2026"
  ],
  "limit": 3
}

3.
extract
- Fetch and Extract Content

3.
extract
- 内容抓取与提取

Extract text content from a URL (HTML or PDF).
Parameters:
  • url
    (string, required): URL to extract
  • max_chars
    (number, optional): Character limit, default 100k
  • mode
    (string, optional):
    "full"
    |
    "abstract"
    |
    "metadata"
Modes:
  • full
    : Complete article text (HTML via Readability, PDF via unpdf)
  • abstract
    : ~1500 chars for triage (PDF page 1 or HTML meta description)
  • metadata
    : PDF page count only
Returns:
  • content
    : Extracted text (markdown for HTML)
  • title
    : Document title
  • excerpt
    : Short summary
  • length
    : Character count
  • is_pdf
    : Boolean
  • page_count
    : Number (PDFs only)
  • extraction_quality
    :
    "high"
    |
    "medium"
    |
    "low"
Example Usage:
typescript
// Extract full academic paper
{
  "url": "https://arxiv.org/pdf/2301.12345.pdf",
  "mode": "full"
}

// Quick abstract for triage
{
  "url": "https://nature.com/articles/s41586-023-12345-6",
  "mode": "abstract",
  "max_chars": 2000
}
Response:
json
{
  "content": "# Paper Title\n\nAbstract: This paper presents...",
  "title": "Novel Approach to AI Safety",
  "excerpt": "This paper presents a novel approach...",
  "length": 45678,
  "is_pdf": true,
  "page_count": 12,
  "extraction_quality": "high"
}
从URL中提取文本内容(支持HTML或PDF格式)。
参数:
  • url
    (字符串,必填):待提取的URL
  • max_chars
    (数字,可选):字符限制,默认值为100000
  • mode
    (字符串,可选):
    "full"
    |
    "abstract"
    |
    "metadata"
模式说明:
  • full
    :提取完整文章文本(HTML通过Readability处理,PDF通过unpdf处理)
  • abstract
    :提取约1500字符的摘要用于快速筛选(PDF取第一页或HTML元描述)
  • metadata
    :仅提取PDF页数信息
返回值:
  • content
    :提取的文本(HTML内容转为markdown格式)
  • title
    :文档标题
  • excerpt
    :简短摘要
  • length
    :字符数
  • is_pdf
    :是否为PDF格式的布尔值
  • page_count
    :页数(仅PDF返回)
  • extraction_quality
    "high"
    |
    "medium"
    |
    "low"
使用示例:
typescript
// 提取完整学术论文
{
  "url": "https://arxiv.org/pdf/2301.12345.pdf",
  "mode": "full"
}

// 快速提取摘要用于筛选
{
  "url": "https://nature.com/articles/s41586-023-12345-6",
  "mode": "abstract",
  "max_chars": 2000
}
响应示例:
json
{
  "content": "# Paper Title\n\nAbstract: This paper presents...",
  "title": "Novel Approach to AI Safety",
  "excerpt": "This paper presents a novel approach...",
  "length": 45678,
  "is_pdf": true,
  "page_count": 12,
  "extraction_quality": "high"
}

4.
search_extract
- Combined Search + Extract

4.
search_extract
- 搜索+提取组合操作

Search and extract content in one call. Efficiently parallelizes extraction.
Parameters:
  • query
    (string, required): Search query
  • limit
    (number, optional): Max results to extract, default 5
  • max_chars
    (number, optional): Per-result char limit
  • mode
    (string, optional):
    "abstract"
    (default) |
    "full"
Best Practices:
  • Use
    mode="abstract"
    (default) for cheap triage with ~1500-char summaries
  • Use
    mode="full"
    only when you need complete article text (slower, more tokens)
Returns:
  • results[]
    : Search results enriched with
    extracted_content
Example Usage:
typescript
// Triage mode (default, token-efficient)
{
  "query": "claude mcp server tutorials",
  "limit": 5,
  "mode": "abstract"
}

// Full extraction (when you need complete content)
{
  "query": "machine learning interpretability survey",
  "limit": 3,
  "mode": "full",
  "max_chars": 50000
}
Response:
json
{
  "results": [
    {
      "title": "Building MCP Servers",
      "url": "https://example.com/mcp-tutorial",
      "snippet": "Complete guide to MCP servers...",
      "extracted_content": {
        "content": "# Building MCP Servers\n\nMCP (Model Context Protocol)...",
        "title": "Building MCP Servers",
        "length": 1523,
        "is_pdf": false,
        "extraction_quality": "high"
      }
    }
  ]
}
在一次调用中完成搜索与内容提取,高效并行处理提取任务。
参数:
  • query
    (字符串,必填):搜索关键词
  • limit
    (数字,可选):待提取的最大结果数,默认值为5
  • max_chars
    (数字,可选):每个结果的字符限制
  • mode
    (字符串,可选):
    "abstract"
    (默认) |
    "full"
最佳实践:
  • 默认使用
    mode="abstract"
    模式,通过约1500-字符的摘要实现低成本快速筛选
  • 仅在需要完整文章文本时使用
    mode="full"
    模式(速度较慢,消耗更多token)
返回值:
  • results[]
    :包含
    extracted_content
    字段的增强版搜索结果
使用示例:
typescript
// 快速筛选模式(默认,token高效)
{
  "query": "claude mcp server tutorials",
  "limit": 5,
  "mode": "abstract"
}

// 完整提取模式(需要完整内容时使用)
{
  "query": "machine learning interpretability survey",
  "limit": 3,
  "mode": "full",
  "max_chars": 50000
}
响应示例:
json
{
  "results": [
    {
      "title": "Building MCP Servers",
      "url": "https://example.com/mcp-tutorial",
      "snippet": "Complete guide to MCP servers...",
      "extracted_content": {
        "content": "# Building MCP Servers\n\nMCP (Model Context Protocol)...",
        "title": "Building MCP Servers",
        "length": 1523,
        "is_pdf": false,
        "extraction_quality": "high"
      }
    }
  ]
}

5.
health
- Server Status

5.
health
- 服务器状态检查

Check server health and configuration.
Returns:
  • status
    :
    "healthy"
    |
    "degraded"
  • cascade_mode
    : Current stealth mode
  • rate_limiter
    : Request counts and limits
  • cache_stats
    : Cache size and hit rates
  • config
    : Active configuration values
Example Usage:
typescript
// No parameters
{}
检查服务器健康状况与配置信息。
返回值:
  • status
    "healthy"
    |
    "degraded"
  • cascade_mode
    :当前隐身模式
  • rate_limiter
    :请求计数与限制信息
  • cache_stats
    :缓存大小与命中率
  • config
    :当前生效的配置值
使用示例:
typescript
// 无需参数
{}

Configuration

配置说明

All configuration via environment variables:
所有配置通过环境变量设置:

Essential Variables

核心变量

bash
undefined
bash
undefined

Chrome binary path (auto-detected if not set)

Chrome二进制文件路径(未设置时自动检测)

CHROME_PATH=/usr/bin/google-chrome
CHROME_PATH=/usr/bin/google-chrome

Profile storage (default: ~/.google-surf-mcp)

配置文件存储路径(默认:~/.google-surf-mcp)

SURF_PROFILE_ROOT=/custom/path/profiles
SURF_PROFILE_ROOT=/custom/path/profiles

Browser locale and timezone

浏览器区域与时区

SURF_LOCALE=en-US SURF_TZ=America/New_York
undefined
SURF_LOCALE=en-US SURF_TZ=America/New_York
undefined

Headless & CAPTCHA Recovery

无头模式与CAPTCHA恢复

bash
undefined
bash
undefined

Run Chrome visibly (for demos/debugging)

以可视化模式运行Chrome(用于演示/调试)

SURF_HEADLESS=false
SURF_HEADLESS=false

Remote debugging mode (headless servers)

远程调试模式(适用于无头服务器)

SURF_REMOTE_DEBUG=true
SURF_REMOTE_DEBUG=true

Cloud/serverless mode (fail-fast on CAPTCHA)

云/无服务器模式(遇到CAPTCHA时快速失败)

SURF_CLOUD_MODE=true
undefined
SURF_CLOUD_MODE=true
undefined

Performance Tuning

性能调优

bash
undefined
bash
undefined

Idle close timeout (ms), 0 disables

空闲关闭超时时间(毫秒),设为0则禁用

SURF_IDLE_CLOSE_MS=30000
SURF_IDLE_CLOSE_MS=30000

Rate limit (requests per minute)

速率限制(每分钟请求数)

SURF_RATE_LIMIT_PER_MIN=10
SURF_RATE_LIMIT_PER_MIN=10

Search cache TTL (ms), 0 disables

搜索缓存过期时间(毫秒),设为0则禁用

SURF_CACHE_TTL_SEARCH_MS=86400000
SURF_CACHE_TTL_SEARCH_MS=86400000

Cache LRU size

缓存LRU容量

SURF_CACHE_MAX_ENTRIES=1000
undefined
SURF_CACHE_MAX_ENTRIES=1000
undefined

Security

安全设置

bash
undefined
bash
undefined

Allow private IPs in extract (default: false)

允许提取私有IP内容(默认:false)

SURF_ALLOW_PRIVATE=true
SURF_ALLOW_PRIVATE=true

Ignore TLS errors (auto-on in cloud mode)

忽略TLS错误(云模式下自动启用)

SURF_INSECURE_TLS=false
SURF_INSECURE_TLS=false

Disable sandbox (auto-on in cloud mode)

禁用沙箱(云模式下自动启用)

SURF_NO_SANDBOX=false
undefined
SURF_NO_SANDBOX=false
undefined

Advanced

高级设置

bash
undefined
bash
undefined

Disable cascade fallback (pin single mode)

禁用级联回退(固定使用单一模式)

SURF_CASCADE_DISABLED=true SURF_USE_STEALTH=true
SURF_CASCADE_DISABLED=true SURF_USE_STEALTH=true

Humanlike browsing (off | background | inline)

类人浏览模式(off | background | inline)

SURF_HUMANLIKE_MODE=background
undefined
SURF_HUMANLIKE_MODE=background
undefined

Common Patterns

常见使用模式

Pattern 1: Research Assistant

模式1:研究助手

Search academic papers and extract abstracts for quick review:
typescript
// Step 1: Search and triage with abstracts
const triage = await use_mcp_tool("google-surf", "search_extract", {
  query: "transformer architecture improvements 2026",
  limit: 10,
  mode: "abstract"
});

// Step 2: Extract full text for promising papers
const topPapers = triage.results.slice(0, 3);
const fullTexts = await Promise.all(
  topPapers.map(paper => 
    use_mcp_tool("google-surf", "extract", {
      url: paper.url,
      mode: "full",
      max_chars: 100000
    })
  )
);
搜索学术论文并提取摘要用于快速审阅:
typescript
// 步骤1:搜索并通过摘要快速筛选
const triage = await use_mcp_tool("google-surf", "search_extract", {
  query: "transformer architecture improvements 2026",
  limit: 10,
  mode: "abstract"
});

// 步骤2:提取有潜力论文的完整文本
const topPapers = triage.results.slice(0, 3);
const fullTexts = await Promise.all(
  topPapers.map(paper => 
    use_mcp_tool("google-surf", "extract", {
      url: paper.url,
      mode: "full",
      max_chars: 100000
    })
  )
);

Pattern 2: Parallel Research

模式2:并行研究

Search multiple related topics simultaneously:
typescript
const relatedTopics = await use_mcp_tool("google-surf", "search_parallel", {
  queries: [
    "MCP server authentication patterns",
    "MCP server error handling",
    "MCP server rate limiting",
    "MCP server caching strategies"
  ],
  limit: 5
});

// Process results by topic
relatedTopics.forEach((topicResults, index) => {
  console.log(`Topic ${index + 1}:`, topicResults.results.length, "results");
});
同时搜索多个相关主题:
typescript
const relatedTopics = await use_mcp_tool("google-surf", "search_parallel", {
  queries: [
    "MCP server authentication patterns",
    "MCP server error handling",
    "MCP server rate limiting",
    "MCP server caching strategies"
  ],
  limit: 5
});

// 按主题处理结果
relatedTopics.forEach((topicResults, index) => {
  console.log(`主题 ${index + 1}:`, topicResults.results.length, "条结果");
});

Pattern 3: Content Aggregation

模式3:内容聚合

Build a comprehensive knowledge base:
typescript
// 1. Find relevant sources
const sources = await use_mcp_tool("google-surf", "search", {
  query: "typescript best practices 2026",
  limit: 20
});

// 2. Extract abstracts to filter quality
const abstracts = await Promise.all(
  sources.results.map(result =>
    use_mcp_tool("google-surf", "extract", {
      url: result.url,
      mode: "abstract"
    })
  )
);

// 3. Full extraction for high-quality sources
const highQuality = abstracts
  .filter(a => a.extraction_quality === "high")
  .slice(0, 5);

const fullContent = await Promise.all(
  highQuality.map(a =>
    use_mcp_tool("google-surf", "extract", {
      url: a.url,
      mode: "full"
    })
  )
);
构建综合性知识库:
typescript
// 1. 查找相关来源
const sources = await use_mcp_tool("google-surf", "search", {
  query: "typescript best practices 2026",
  limit: 20
});

// 2. 提取摘要筛选优质内容
const abstracts = await Promise.all(
  sources.results.map(result =>
    use_mcp_tool("google-surf", "extract", {
      url: result.url,
      mode: "abstract"
    })
  )
);

// 3. 提取优质来源的完整内容
const highQuality = abstracts
  .filter(a => a.extraction_quality === "high")
  .slice(0, 5);

const fullContent = await Promise.all(
  highQuality.map(a =>
    use_mcp_tool("google-surf", "extract", {
      url: a.url,
      mode: "full"
    })
  )
);

Pattern 4: Health Check Before Heavy Operations

模式4:大型操作前的健康检查

typescript
// Check server health before batch operations
const health = await use_mcp_tool("google-surf", "health", {});

if (health.status !== "healthy") {
  console.warn("Server degraded, reducing concurrency");
}

const rateLimit = health.rate_limiter.requests_per_minute;
if (rateLimit > 8) {
  // Wait before starting batch
  await sleep(60000);
}
typescript
// 批量操作前检查服务器健康状况
const health = await use_mcp_tool("google-surf", "health", {});

if (health.status !== "healthy") {
  console.warn("服务器状态不佳,降低并发量");
}

const rateLimit = health.rate_limiter.requests_per_minute;
if (rateLimit > 8) {
  // 开始批量操作前等待
  await sleep(60000);
}

CAPTCHA Recovery Modes

CAPTCHA恢复模式

The server handles CAPTCHAs automatically based on environment:
服务器会根据运行环境自动处理CAPTCHA:

Mode 1: Local Desktop (default)

模式1:本地桌面(默认)

bash
undefined
bash
undefined

No config needed - default behavior

无需额外配置 - 默认行为


When CAPTCHA appears:
1. OS notification fires
2. Headed Chrome window opens
3. Human solves CAPTCHA
4. Call automatically retries
5. Profile reputation preserved

遇到CAPTCHA时:
1. 触发系统通知
2. 打开可视化Chrome窗口
3. 人工完成CAPTCHA验证
4. 自动重试请求
5. 保留配置文件信誉

Mode 2: Visible Chrome (demos/debugging)

模式2:可视化Chrome(演示/调试)

bash
SURF_HEADLESS=false
  • Chrome runs visibly at all times
  • CAPTCHA recovery skips notification (user is watching)
  • Good for demos and debugging
bash
SURF_HEADLESS=false
  • Chrome始终以可视化模式运行
  • CAPTCHA恢复跳过通知(用户实时监控)
  • 适用于演示与调试场景

Mode 3: Remote Debugging (headless servers)

模式3:远程调试(无头服务器)

bash
SURF_HEADLESS=true
SURF_REMOTE_DEBUG=true
When CAPTCHA appears:
  1. DevTools port printed to logs
  2. Error thrown with instructions
  3. SSH port-forward from local machine
  4. Open
    chrome://inspect
    locally
  5. Solve CAPTCHA remotely
  6. Retry the call
Example SSH forward:
bash
ssh -L 9222:localhost:9222 your-server
bash
SURF_HEADLESS=true
SURF_REMOTE_DEBUG=true
遇到CAPTCHA时:
  1. 日志中打印DevTools端口
  2. 抛出包含操作指引的错误
  3. 从本地机器通过SSH端口转发
  4. 在本地打开
    chrome://inspect
  5. 远程完成CAPTCHA验证
  6. 重试请求
SSH转发示例:
bash
ssh -L 9222:localhost:9222 your-server

Mode 4: Cloud/Serverless (fail-fast)

模式4:云/无服务器(快速失败)

bash
SURF_CLOUD_MODE=true
  • No CAPTCHA recovery
  • Throws
    CAPTCHA_REQUIRED
    error immediately
  • Worker pool disabled
  • Sandbox disabled, TLS bypass enabled
bash
SURF_CLOUD_MODE=true
  • 不支持CAPTCHA恢复
  • 立即抛出
    CAPTCHA_REQUIRED
    错误
  • 禁用工作池
  • 自动禁用沙箱并启用TLS绕过

Troubleshooting

故障排除

Chrome Not Found

Chrome未找到

Error:
Chrome binary not found
Solution:
bash
undefined
错误信息:
Chrome binary not found
解决方案:
bash
undefined

Find your Chrome installation

查找Chrome安装路径

which google-chrome which chromium
which google-chrome which chromium

Set explicitly

显式设置路径

CHROME_PATH=/usr/bin/google-chrome npm run bootstrap
undefined
CHROME_PATH=/usr/bin/google-chrome npm run bootstrap
undefined

CAPTCHA Loops

CAPTCHA循环

Symptoms: Repeated CAPTCHA requests
Solutions:
  1. Run bootstrap to warm the profile:
bash
npm run bootstrap
  1. Reduce request rate:
bash
SURF_RATE_LIMIT_PER_MIN=5 npx google-surf-mcp
  1. Check cascade mode:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.cascade_mode); // Should cycle: none → stealth → humanlike
症状: 反复出现CAPTCHA请求
解决方案:
  1. 运行引导命令预热配置文件:
bash
npm run bootstrap
  1. 降低请求速率:
bash
SURF_RATE_LIMIT_PER_MIN=5 npx google-surf-mcp
  1. 检查级联模式:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.cascade_mode); // 应循环切换:none → stealth → humanlike

Empty or No Results

无结果或结果为空

Check health first:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
// Check rate_limiter.requests_per_minute
// Check cache_stats for anomalies
Clear cache if stale:
bash
SURF_CACHE_TTL_SEARCH_MS=0 npx google-surf-mcp
Check dropped reasons:
typescript
const results = await use_mcp_tool("google-surf", "search", {
  query: "test query"
});
console.log(results.dropped_reasons);
// If all results dropped as "sponsored", selector may be stale
先检查健康状况:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
// 检查rate_limiter.requests_per_minute
// 检查cache_stats是否存在异常
清理过期缓存:
bash
SURF_CACHE_TTL_SEARCH_MS=0 npx google-surf-mcp
检查过滤原因:
typescript
const results = await use_mcp_tool("google-surf", "search", {
  query: "test query"
});
console.log(results.dropped_reasons);
// 如果所有结果都因"sponsored"被过滤,说明选择器可能已失效

Extraction Failures

提取失败

PDF extraction fails:
typescript
// Try metadata mode first
const meta = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/paper.pdf",
  mode: "metadata"
});
console.log(meta.page_count); // If 0, PDF is inaccessible
SSRF blocked:
bash
undefined
PDF提取失败:
typescript
// 先尝试元数据模式
const meta = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/paper.pdf",
  mode: "metadata"
});
console.log(meta.page_count); // 如果为0,说明PDF无法访问
SSRF被阻止:
bash
undefined

Allow private IPs (only if you control the URLs)

允许私有IP(仅当你能控制目标URL时使用)

SURF_ALLOW_PRIVATE=true npx google-surf-mcp

**Low extraction quality:**

```typescript
const result = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/article"
});

if (result.extraction_quality === "low") {
  // HTML was poorly structured or blocked
  // Try fetching directly via other means
}
SURF_ALLOW_PRIVATE=true npx google-surf-mcp

**提取质量低:**

```typescript
const result = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/article"
});

if (result.extraction_quality === "low") {
  // HTML结构不佳或被阻止
  // 尝试通过其他方式直接抓取
}

Performance Issues

性能问题

Slow first call:
Normal. First call bootstraps the profile (~4s sequential, ~9s parallel). Subsequent calls are faster (~1.5s).
Idle timeout too aggressive:
bash
undefined
首次调用缓慢:
属于正常现象。首次调用会预热配置文件(串行约4秒,并行约9秒)。后续调用速度更快(约1.5秒)。
空闲超时过于激进:
bash
undefined

Keep contexts warm longer

延长上下文保持时间

SURF_IDLE_CLOSE_MS=120000 npx google-surf-mcp

**Too many parallel queries:**

Limit to 10 per `search_parallel` call. For more, batch them:

```typescript
const queries = [...100queries];
const batches = chunk(queries, 10);

for (const batch of batches) {
  const results = await use_mcp_tool("google-surf", "search_parallel", {
    queries: batch
  });
  // Process batch
  await sleep(5000); // Respect rate limits
}
SURF_IDLE_CLOSE_MS=120000 npx google-surf-mcp

**并行查询过多:**

`search_parallel`调用最多限制10个关键词。如需更多,分批处理:

```typescript
const queries = [...100queries];
const batches = chunk(queries, 10);

for (const batch of batches) {
  const results = await use_mcp_tool("google-surf", "search_parallel", {
    queries: batch
  });
  // 处理批次结果
  await sleep(5000); // 遵守速率限制
}

Academic Sources Supported

支持的学术来源

Inline PDF extraction for:
  • arXiv
  • bioRxiv, medRxiv
  • Nature, Science, Cell
  • OpenReview
  • NeurIPS, ICML, ICLR proceedings
  • JMLR, PMLR
  • Springer
  • PubMed (via PMC)
  • ACL Anthology
All extracted to markdown-formatted text.
支持直接提取以下平台的PDF内容:
  • arXiv
  • bioRxiv, medRxiv
  • Nature, Science, Cell
  • OpenReview
  • NeurIPS, ICML, ICLR会议论文集
  • JMLR, PMLR
  • Springer
  • PubMed(通过PMC)
  • ACL Anthology
所有内容均提取为markdown格式文本。

Cache Management

缓存管理

bash
undefined
bash
undefined

Disable search caching

禁用搜索缓存

SURF_CACHE_TTL_SEARCH_MS=0
SURF_CACHE_TTL_SEARCH_MS=0

Increase cache size

增大缓存容量

SURF_CACHE_MAX_ENTRIES=5000
SURF_CACHE_MAX_ENTRIES=5000

Custom cache location

自定义缓存路径

SURF_CACHE_ROOT=/tmp/google-surf-cache

Cache namespaces:
- `search`: Google search results (24h TTL default)
- `extract`: URL content extractions (no TTL, LRU only)
SURF_CACHE_ROOT=/tmp/google-surf-cache

缓存命名空间:
- `search`:谷歌搜索结果(默认24小时过期)
- `extract`:URL内容提取结果(无过期时间,仅受LRU限制)

Rate Limiting

速率限制

Built-in rate limiter prevents Google blocks:
bash
undefined
内置速率限制器防止被谷歌封禁:
bash
undefined

Default: 10 requests/minute

默认:每分钟10次请求

SURF_RATE_LIMIT_PER_MIN=10
SURF_RATE_LIMIT_PER_MIN=10

Conservative for shared IPs

共享IP环境下的保守设置

SURF_RATE_LIMIT_PER_MIN=5
SURF_RATE_LIMIT_PER_MIN=5

Aggressive (may trigger CAPTCHAs)

激进设置(可能触发CAPTCHA)

SURF_RATE_LIMIT_PER_MIN=20

Check current usage:

```typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.rate_limiter);
// { requests_per_minute: 7, limit: 10, window_start: "2026-05-17T..." }
SURF_RATE_LIMIT_PER_MIN=20

查看当前使用情况:

```typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.rate_limiter);
// { requests_per_minute: 7, limit: 10, window_start: "2026-05-17T..." }

Best Practices

最佳实践

  1. Use abstract mode for triage: Default
    search_extract
    to
    mode="abstract"
    to save tokens and time. Only request
    mode="full"
    when needed.
  2. Batch related queries: Use
    search_parallel
    instead of sequential
    search
    calls.
  3. Check health before batch ops: Prevents hitting rate limits mid-batch.
  4. Respect cache TTLs: Default 24h for search is sensible. Don't disable unless debugging.
  5. Handle extraction failures gracefully: Always check
    extraction_quality
    and handle
    { error }
    responses.
  6. Profile warmth: First call of the day may be slower. Acceptable for human-in-the-loop workflows.
  7. CAPTCHA strategy: For long-running agents, use
    SURF_CLOUD_MODE=false
    and solve CAPTCHAs as they appear to preserve profile reputation.
  1. 使用摘要模式快速筛选:默认将
    search_extract
    设为
    mode="abstract"
    以节省token和时间。仅在需要时使用
    mode="full"
  2. 批量处理相关关键词:使用
    search_parallel
    替代串行
    search
    调用。
  3. 批量操作前检查健康状况:避免在批量操作中途触发速率限制。
  4. 遵守缓存过期时间:默认24小时的搜索缓存设置较为合理。除非调试,否则不要禁用。
  5. 优雅处理提取失败:始终检查
    extraction_quality
    并处理
    { error }
    响应。
  6. 预热配置文件:每日首次调用可能较慢,适合有人工参与的工作流。
  7. CAPTCHA策略:对于长期运行的Agent,使用
    SURF_CLOUD_MODE=false
    并在出现CAPTCHA时及时验证,以维护配置文件信誉。