google-surf-mcp-search

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

google-surf-mcp-search

Skill by ara.so — MCP Skills collection.

由 ara.so 开发的Skill —— 属于MCP Skills集合。

What It Does

功能介绍

google-surf-mcp is an MCP server that provides Google search functionality without requiring API keys. It combines three capabilities in one:

Google search with ad/spam filtering
URL content extraction (HTML + PDF)
Academic paper extraction (arXiv, Nature, PubMed, etc.)

Key features:

Works with actual Google search (not an API wrapper)
Automatic CAPTCHA recovery with persistent browser profiles
Parallel search and extraction
Token-efficient abstract mode for triage
Built-in rate limiting and caching
Geometric verification to drop sponsored ads and knowledge panels

google-surf-mcp是一款无需API密钥即可提供谷歌搜索功能的MCP服务器，集成了三大核心能力：

带广告/垃圾信息过滤的谷歌搜索
URL内容提取（HTML + PDF格式）
学术论文提取（支持arXiv、Nature、PubMed等平台）

核心特性：

基于真实谷歌搜索实现（非API封装）
借助持久化浏览器配置文件自动完成CAPTCHA恢复
支持并行搜索与内容提取
适用于快速筛选的高效摘要模式
内置速率限制与缓存机制
通过几何验证过滤赞助广告和知识面板

Installation

安装方法

Quick Install (npx)

快速安装（npx方式）

Add to your MCP client config (e.g.,

~/.claude.json

for Claude Code):

json

{
  "mcpServers": {
    "google-surf": {
      "command": "npx",
      "args": ["-y", "google-surf-mcp"]
    }
  }
}

将以下配置添加到你的MCP客户端配置文件中（例如Claude Code的

~/.claude.json

）：

json

{
  "mcpServers": {
    "google-surf": {
      "command": "npx",
      "args": ["-y", "google-surf-mcp"]
    }
  }
}

Local Clone Installation

本地克隆安装

bash

git clone https://github.com/HarimxChoi/google-surf-mcp
cd google-surf-mcp
npm install
npm run build

Config for local installation:

json

{
  "mcpServers": {
    "google-surf": {
      "command": "node",
      "args": ["/absolute/path/to/google-surf-mcp/build/index.js"]
    }
  }
}

bash

git clone https://github.com/HarimxChoi/google-surf-mcp
cd google-surf-mcp
npm install
npm run build

本地安装对应的配置：

json

{
  "mcpServers": {
    "google-surf": {
      "command": "node",
      "args": ["/absolute/path/to/google-surf-mcp/build/index.js"]
    }
  }
}

Manual Bootstrap (if auto-bootstrap fails)

手动引导（自动引导失败时使用）

bash

npm run bootstrap

With custom paths:

bash

CHROME_PATH=/usr/bin/google-chrome SURF_TZ=America/New_York npm run bootstrap

bash

npm run bootstrap

自定义路径引导：

bash

CHROME_PATH=/usr/bin/google-chrome SURF_TZ=America/New_York npm run bootstrap

Available Tools

可用工具

search

- Single Google Search

search

- 单次谷歌搜索

Performs a single Google search, returns filtered results (ads removed).

Parameters:

```
query
```
(string, required): Search query
```
limit
```
(number, optional): Max results, default 10

Returns:

```
results[]
```
: Array of
```
{ title, url, snippet }
```
```
dropped
```
: Count of filtered results (ads, knowledge panels)
```
dropped_reasons[]
```
: Why items were dropped
```
cache_hit
```
: Boolean indicating cache use

Example Usage:

typescript

// Via MCP tool call
{
  "query": "typescript async patterns",
  "limit": 5
}

Response:

json

{
  "results": [
    {
      "title": "Async/Await in TypeScript",
      "url": "https://example.com/typescript-async",
      "snippet": "Learn how to use async/await patterns..."
    }
  ],
  "dropped": 2,
  "dropped_reasons": ["sponsored", "knowledge_panel"],
  "cache_hit": false
}

执行单次谷歌搜索，返回过滤后的结果（已移除广告）。

参数：

```
query
```
（字符串，必填）：搜索关键词
```
limit
```
（数字，可选）：最大结果数，默认值为10

返回值：

```
results[]
```
：包含
```
{ title, url, snippet }
```
的结果数组
```
dropped
```
：被过滤的结果数量（广告、知识面板）
```
dropped_reasons[]
```
：结果被过滤的原因
```
cache_hit
```
：是否命中缓存的布尔值

使用示例：

typescript

// 通过MCP工具调用
{
  "query": "typescript async patterns",
  "limit": 5
}

响应示例：

json

{
  "results": [
    {
      "title": "Async/Await in TypeScript",
      "url": "https://example.com/typescript-async",
      "snippet": "Learn how to use async/await patterns..."
    }
  ],
  "dropped": 2,
  "dropped_reasons": ["sponsored", "knowledge_panel"],
  "cache_hit": false
}

search_parallel

- Parallel Multi-Query Search

search_parallel

- 多关键词并行搜索

Execute multiple searches in parallel using a worker pool (max 10 queries).

Parameters:

```
queries
```
(string[], required): Array of search queries
```
limit
```
(number, optional): Max results per query, default 10

Returns:

Array of search results (same format as
```
search
```
)

Example Usage:

typescript

{
  "queries": [
    "mcp server best practices",
    "playwright stealth techniques",
    "typescript pdf extraction",
    "google search scraping 2026"
  ],
  "limit": 3
}

通过工作池并行执行多个搜索（最多支持10个关键词）。

参数：

```
queries
```
（字符串数组，必填）：搜索关键词数组
```
limit
```
（数字，可选）：每个关键词的最大结果数，默认值为10

返回值：

搜索结果数组（格式与
```
search
```
工具一致）

使用示例：

typescript

{
  "queries": [
    "mcp server best practices",
    "playwright stealth techniques",
    "typescript pdf extraction",
    "google search scraping 2026"
  ],
  "limit": 3
}

extract

- Fetch and Extract Content

extract

- 内容抓取与提取

Extract text content from a URL (HTML or PDF).

Parameters:

```
url
```
(string, required): URL to extract
```
max_chars
```
(number, optional): Character limit, default 100k
```
mode
```
(string, optional):
```
"full"
```
|
```
"abstract"
```
|
```
"metadata"
```

Modes:

```
full
```
: Complete article text (HTML via Readability, PDF via unpdf)
```
abstract
```
: ~1500 chars for triage (PDF page 1 or HTML meta description)
```
metadata
```
: PDF page count only

Returns:

```
content
```
: Extracted text (markdown for HTML)
```
title
```
: Document title
```
excerpt
```
: Short summary
```
length
```
: Character count
```
is_pdf
```
: Boolean
```
page_count
```
: Number (PDFs only)
```
extraction_quality
```
:
```
"high"
```
|
```
"medium"
```
|
```
"low"
```

Example Usage:

typescript

// Extract full academic paper
{
  "url": "https://arxiv.org/pdf/2301.12345.pdf",
  "mode": "full"
}

// Quick abstract for triage
{
  "url": "https://nature.com/articles/s41586-023-12345-6",
  "mode": "abstract",
  "max_chars": 2000
}

Response:

json

{
  "content": "# Paper Title\n\nAbstract: This paper presents...",
  "title": "Novel Approach to AI Safety",
  "excerpt": "This paper presents a novel approach...",
  "length": 45678,
  "is_pdf": true,
  "page_count": 12,
  "extraction_quality": "high"
}

从URL中提取文本内容（支持HTML或PDF格式）。

参数：

```
url
```
（字符串，必填）：待提取的URL
```
max_chars
```
（数字，可选）：字符限制，默认值为100000
```
mode
```
（字符串，可选）：
```
"full"
```
|
```
"abstract"
```
|
```
"metadata"
```

模式说明：

```
full
```
：提取完整文章文本（HTML通过Readability处理，PDF通过unpdf处理）
```
abstract
```
：提取约1500字符的摘要用于快速筛选（PDF取第一页或HTML元描述）
```
metadata
```
：仅提取PDF页数信息

返回值：

```
content
```
：提取的文本（HTML内容转为markdown格式）
```
title
```
：文档标题
```
excerpt
```
：简短摘要
```
length
```
：字符数
```
is_pdf
```
：是否为PDF格式的布尔值
```
page_count
```
：页数（仅PDF返回）
```
extraction_quality
```
：
```
"high"
```
|
```
"medium"
```
|
```
"low"
```

使用示例：

typescript

// 提取完整学术论文
{
  "url": "https://arxiv.org/pdf/2301.12345.pdf",
  "mode": "full"
}

// 快速提取摘要用于筛选
{
  "url": "https://nature.com/articles/s41586-023-12345-6",
  "mode": "abstract",
  "max_chars": 2000
}

响应示例：

json

{
  "content": "# Paper Title\n\nAbstract: This paper presents...",
  "title": "Novel Approach to AI Safety",
  "excerpt": "This paper presents a novel approach...",
  "length": 45678,
  "is_pdf": true,
  "page_count": 12,
  "extraction_quality": "high"
}

search_extract

- Combined Search + Extract

search_extract

- 搜索+提取组合操作

Search and extract content in one call. Efficiently parallelizes extraction.

Parameters:

```
query
```
(string, required): Search query
```
limit
```
(number, optional): Max results to extract, default 5
```
max_chars
```
(number, optional): Per-result char limit
```
mode
```
(string, optional):
```
"abstract"
```
(default) |
```
"full"
```

Best Practices:

Use
```
mode="abstract"
```
(default) for cheap triage with ~1500-char summaries
Use
```
mode="full"
```
only when you need complete article text (slower, more tokens)

Returns:

```
results[]
```
: Search results enriched with
```
extracted_content
```

Example Usage:

typescript

// Triage mode (default, token-efficient)
{
  "query": "claude mcp server tutorials",
  "limit": 5,
  "mode": "abstract"
}

// Full extraction (when you need complete content)
{
  "query": "machine learning interpretability survey",
  "limit": 3,
  "mode": "full",
  "max_chars": 50000
}

Response:

json

{
  "results": [
    {
      "title": "Building MCP Servers",
      "url": "https://example.com/mcp-tutorial",
      "snippet": "Complete guide to MCP servers...",
      "extracted_content": {
        "content": "# Building MCP Servers\n\nMCP (Model Context Protocol)...",
        "title": "Building MCP Servers",
        "length": 1523,
        "is_pdf": false,
        "extraction_quality": "high"
      }
    }
  ]
}

在一次调用中完成搜索与内容提取，高效并行处理提取任务。

参数：

```
query
```
（字符串，必填）：搜索关键词
```
limit
```
（数字，可选）：待提取的最大结果数，默认值为5
```
max_chars
```
（数字，可选）：每个结果的字符限制
```
mode
```
（字符串，可选）：
```
"abstract"
```
（默认） |
```
"full"
```

最佳实践：

默认使用
```
mode="abstract"
```
模式，通过约1500-字符的摘要实现低成本快速筛选
仅在需要完整文章文本时使用
```
mode="full"
```
模式（速度较慢，消耗更多token）

返回值：

```
results[]
```
：包含
```
extracted_content
```
字段的增强版搜索结果

使用示例：

typescript

// 快速筛选模式（默认，token高效）
{
  "query": "claude mcp server tutorials",
  "limit": 5,
  "mode": "abstract"
}

// 完整提取模式（需要完整内容时使用）
{
  "query": "machine learning interpretability survey",
  "limit": 3,
  "mode": "full",
  "max_chars": 50000
}

响应示例：

json

{
  "results": [
    {
      "title": "Building MCP Servers",
      "url": "https://example.com/mcp-tutorial",
      "snippet": "Complete guide to MCP servers...",
      "extracted_content": {
        "content": "# Building MCP Servers\n\nMCP (Model Context Protocol)...",
        "title": "Building MCP Servers",
        "length": 1523,
        "is_pdf": false,
        "extraction_quality": "high"
      }
    }
  ]
}

health

- Server Status

health

- 服务器状态检查

Check server health and configuration.

Returns:

```
status
```
:
```
"healthy"
```
|
```
"degraded"
```
```
cascade_mode
```
: Current stealth mode
```
rate_limiter
```
: Request counts and limits
```
cache_stats
```
: Cache size and hit rates
```
config
```
: Active configuration values

Example Usage:

typescript

// No parameters
{}

检查服务器健康状况与配置信息。

返回值：

```
status
```
：
```
"healthy"
```
|
```
"degraded"
```
```
cascade_mode
```
：当前隐身模式
```
rate_limiter
```
：请求计数与限制信息
```
cache_stats
```
：缓存大小与命中率
```
config
```
：当前生效的配置值

使用示例：

typescript

// 无需参数
{}

Configuration

配置说明

All configuration via environment variables:

所有配置通过环境变量设置：

Essential Variables

核心变量

bash

undefined

bash

undefined

Chrome binary path (auto-detected if not set)

Chrome二进制文件路径（未设置时自动检测）

CHROME_PATH=/usr/bin/google-chrome

Profile storage (default: ~/.google-surf-mcp)

配置文件存储路径（默认：~/.google-surf-mcp）

SURF_PROFILE_ROOT=/custom/path/profiles

Browser locale and timezone

浏览器区域与时区

SURF_LOCALE=en-US SURF_TZ=America/New_York

undefined

SURF_LOCALE=en-US SURF_TZ=America/New_York

undefined

Headless & CAPTCHA Recovery

无头模式与CAPTCHA恢复

bash

undefined

bash

undefined

Run Chrome visibly (for demos/debugging)

以可视化模式运行Chrome（用于演示/调试）

SURF_HEADLESS=false

Remote debugging mode (headless servers)

远程调试模式（适用于无头服务器）

SURF_REMOTE_DEBUG=true

Cloud/serverless mode (fail-fast on CAPTCHA)

云/无服务器模式（遇到CAPTCHA时快速失败）

SURF_CLOUD_MODE=true

undefined

SURF_CLOUD_MODE=true

undefined

Performance Tuning

性能调优

bash

undefined

bash

undefined

Idle close timeout (ms), 0 disables

空闲关闭超时时间（毫秒），设为0则禁用

SURF_IDLE_CLOSE_MS=30000

Rate limit (requests per minute)

速率限制（每分钟请求数）

SURF_RATE_LIMIT_PER_MIN=10

Search cache TTL (ms), 0 disables

搜索缓存过期时间（毫秒），设为0则禁用

SURF_CACHE_TTL_SEARCH_MS=86400000

Cache LRU size

缓存LRU容量

SURF_CACHE_MAX_ENTRIES=1000

undefined

SURF_CACHE_MAX_ENTRIES=1000

undefined

Security

安全设置

bash

undefined

bash

undefined

Allow private IPs in extract (default: false)

允许提取私有IP内容（默认：false）

SURF_ALLOW_PRIVATE=true

Ignore TLS errors (auto-on in cloud mode)

忽略TLS错误（云模式下自动启用）

SURF_INSECURE_TLS=false

Disable sandbox (auto-on in cloud mode)

禁用沙箱（云模式下自动启用）

SURF_NO_SANDBOX=false

undefined

SURF_NO_SANDBOX=false

undefined

Advanced

高级设置

bash

undefined

bash

undefined

Disable cascade fallback (pin single mode)

禁用级联回退（固定使用单一模式）

SURF_CASCADE_DISABLED=true SURF_USE_STEALTH=true

Humanlike browsing (off | background | inline)

类人浏览模式（off | background | inline）

SURF_HUMANLIKE_MODE=background

undefined

SURF_HUMANLIKE_MODE=background

undefined

Common Patterns

常见使用模式

Pattern 1: Research Assistant

模式1：研究助手

Search academic papers and extract abstracts for quick review:

typescript

// Step 1: Search and triage with abstracts
const triage = await use_mcp_tool("google-surf", "search_extract", {
  query: "transformer architecture improvements 2026",
  limit: 10,
  mode: "abstract"
});

// Step 2: Extract full text for promising papers
const topPapers = triage.results.slice(0, 3);
const fullTexts = await Promise.all(
  topPapers.map(paper => 
    use_mcp_tool("google-surf", "extract", {
      url: paper.url,
      mode: "full",
      max_chars: 100000
    })
  )
);

搜索学术论文并提取摘要用于快速审阅：

typescript

// 步骤1：搜索并通过摘要快速筛选
const triage = await use_mcp_tool("google-surf", "search_extract", {
  query: "transformer architecture improvements 2026",
  limit: 10,
  mode: "abstract"
});

// 步骤2：提取有潜力论文的完整文本
const topPapers = triage.results.slice(0, 3);
const fullTexts = await Promise.all(
  topPapers.map(paper => 
    use_mcp_tool("google-surf", "extract", {
      url: paper.url,
      mode: "full",
      max_chars: 100000
    })
  )
);

Pattern 2: Parallel Research

模式2：并行研究

Search multiple related topics simultaneously:

typescript

const relatedTopics = await use_mcp_tool("google-surf", "search_parallel", {
  queries: [
    "MCP server authentication patterns",
    "MCP server error handling",
    "MCP server rate limiting",
    "MCP server caching strategies"
  ],
  limit: 5
});

// Process results by topic
relatedTopics.forEach((topicResults, index) => {
  console.log(`Topic ${index + 1}:`, topicResults.results.length, "results");
});

同时搜索多个相关主题：

typescript

const relatedTopics = await use_mcp_tool("google-surf", "search_parallel", {
  queries: [
    "MCP server authentication patterns",
    "MCP server error handling",
    "MCP server rate limiting",
    "MCP server caching strategies"
  ],
  limit: 5
});

// 按主题处理结果
relatedTopics.forEach((topicResults, index) => {
  console.log(`主题 ${index + 1}:`, topicResults.results.length, "条结果");
});

Pattern 3: Content Aggregation

模式3：内容聚合

Build a comprehensive knowledge base:

typescript

// 1. Find relevant sources
const sources = await use_mcp_tool("google-surf", "search", {
  query: "typescript best practices 2026",
  limit: 20
});

// 2. Extract abstracts to filter quality
const abstracts = await Promise.all(
  sources.results.map(result =>
    use_mcp_tool("google-surf", "extract", {
      url: result.url,
      mode: "abstract"
    })
  )
);

// 3. Full extraction for high-quality sources
const highQuality = abstracts
  .filter(a => a.extraction_quality === "high")
  .slice(0, 5);

const fullContent = await Promise.all(
  highQuality.map(a =>
    use_mcp_tool("google-surf", "extract", {
      url: a.url,
      mode: "full"
    })
  )
);

构建综合性知识库：

typescript

// 1. 查找相关来源
const sources = await use_mcp_tool("google-surf", "search", {
  query: "typescript best practices 2026",
  limit: 20
});

// 2. 提取摘要筛选优质内容
const abstracts = await Promise.all(
  sources.results.map(result =>
    use_mcp_tool("google-surf", "extract", {
      url: result.url,
      mode: "abstract"
    })
  )
);

// 3. 提取优质来源的完整内容
const highQuality = abstracts
  .filter(a => a.extraction_quality === "high")
  .slice(0, 5);

const fullContent = await Promise.all(
  highQuality.map(a =>
    use_mcp_tool("google-surf", "extract", {
      url: a.url,
      mode: "full"
    })
  )
);

Pattern 4: Health Check Before Heavy Operations

模式4：大型操作前的健康检查

typescript

// Check server health before batch operations
const health = await use_mcp_tool("google-surf", "health", {});

if (health.status !== "healthy") {
  console.warn("Server degraded, reducing concurrency");
}

const rateLimit = health.rate_limiter.requests_per_minute;
if (rateLimit > 8) {
  // Wait before starting batch
  await sleep(60000);
}

typescript

// 批量操作前检查服务器健康状况
const health = await use_mcp_tool("google-surf", "health", {});

if (health.status !== "healthy") {
  console.warn("服务器状态不佳，降低并发量");
}

const rateLimit = health.rate_limiter.requests_per_minute;
if (rateLimit > 8) {
  // 开始批量操作前等待
  await sleep(60000);
}

CAPTCHA Recovery Modes

CAPTCHA恢复模式

The server handles CAPTCHAs automatically based on environment:

服务器会根据运行环境自动处理CAPTCHA：

Mode 1: Local Desktop (default)

模式1：本地桌面（默认）

bash

undefined

bash

undefined

No config needed - default behavior

无需额外配置 - 默认行为


When CAPTCHA appears:
1. OS notification fires
2. Headed Chrome window opens
3. Human solves CAPTCHA
4. Call automatically retries
5. Profile reputation preserved


遇到CAPTCHA时：
1. 触发系统通知
2. 打开可视化Chrome窗口
3. 人工完成CAPTCHA验证
4. 自动重试请求
5. 保留配置文件信誉

Mode 2: Visible Chrome (demos/debugging)

模式2：可视化Chrome（演示/调试）

bash

SURF_HEADLESS=false

Chrome runs visibly at all times
CAPTCHA recovery skips notification (user is watching)
Good for demos and debugging

bash

SURF_HEADLESS=false

Chrome始终以可视化模式运行
CAPTCHA恢复跳过通知（用户实时监控）
适用于演示与调试场景

Mode 3: Remote Debugging (headless servers)

模式3：远程调试（无头服务器）

bash

SURF_HEADLESS=true
SURF_REMOTE_DEBUG=true

When CAPTCHA appears:

DevTools port printed to logs
Error thrown with instructions
SSH port-forward from local machine
Open
```
chrome://inspect
```
locally
Solve CAPTCHA remotely
Retry the call

Example SSH forward:

bash

ssh -L 9222:localhost:9222 your-server

bash

SURF_HEADLESS=true
SURF_REMOTE_DEBUG=true

遇到CAPTCHA时：

日志中打印DevTools端口
抛出包含操作指引的错误
从本地机器通过SSH端口转发
在本地打开
```
chrome://inspect
```
远程完成CAPTCHA验证
重试请求

SSH转发示例：

bash

ssh -L 9222:localhost:9222 your-server

Mode 4: Cloud/Serverless (fail-fast)

模式4：云/无服务器（快速失败）

bash

SURF_CLOUD_MODE=true

No CAPTCHA recovery
Throws
```
CAPTCHA_REQUIRED
```
error immediately
Worker pool disabled
Sandbox disabled, TLS bypass enabled

bash

SURF_CLOUD_MODE=true

不支持CAPTCHA恢复
立即抛出
```
CAPTCHA_REQUIRED
```
错误
禁用工作池
自动禁用沙箱并启用TLS绕过

Troubleshooting

故障排除

Chrome Not Found

Chrome未找到

Error:

Chrome binary not found

Solution:

bash

undefined

错误信息：

Chrome binary not found

解决方案：

bash

undefined

Find your Chrome installation

查找Chrome安装路径

which google-chrome which chromium

Set explicitly

显式设置路径

CHROME_PATH=/usr/bin/google-chrome npm run bootstrap

undefined

CHROME_PATH=/usr/bin/google-chrome npm run bootstrap

undefined

CAPTCHA Loops

CAPTCHA循环

Symptoms: Repeated CAPTCHA requests

Solutions:

Run bootstrap to warm the profile:

bash

npm run bootstrap

Reduce request rate:

bash

SURF_RATE_LIMIT_PER_MIN=5 npx google-surf-mcp

Check cascade mode:

typescript

const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.cascade_mode); // Should cycle: none → stealth → humanlike

症状： 反复出现CAPTCHA请求

解决方案：

运行引导命令预热配置文件：

bash

npm run bootstrap

降低请求速率：

bash

SURF_RATE_LIMIT_PER_MIN=5 npx google-surf-mcp

检查级联模式：

typescript

const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.cascade_mode); // 应循环切换：none → stealth → humanlike

Empty or No Results

无结果或结果为空

Check health first:

typescript

const health = await use_mcp_tool("google-surf", "health", {});
// Check rate_limiter.requests_per_minute
// Check cache_stats for anomalies

Clear cache if stale:

bash

SURF_CACHE_TTL_SEARCH_MS=0 npx google-surf-mcp

Check dropped reasons:

typescript

const results = await use_mcp_tool("google-surf", "search", {
  query: "test query"
});
console.log(results.dropped_reasons);
// If all results dropped as "sponsored", selector may be stale

先检查健康状况：

typescript

const health = await use_mcp_tool("google-surf", "health", {});
// 检查rate_limiter.requests_per_minute
// 检查cache_stats是否存在异常

清理过期缓存：

bash

SURF_CACHE_TTL_SEARCH_MS=0 npx google-surf-mcp

检查过滤原因：

typescript

const results = await use_mcp_tool("google-surf", "search", {
  query: "test query"
});
console.log(results.dropped_reasons);
// 如果所有结果都因"sponsored"被过滤，说明选择器可能已失效

Extraction Failures

提取失败

PDF extraction fails:

typescript

// Try metadata mode first
const meta = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/paper.pdf",
  mode: "metadata"
});
console.log(meta.page_count); // If 0, PDF is inaccessible

SSRF blocked:

bash

undefined

PDF提取失败：

typescript

// 先尝试元数据模式
const meta = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/paper.pdf",
  mode: "metadata"
});
console.log(meta.page_count); // 如果为0，说明PDF无法访问

SSRF被阻止：

bash

undefined

Allow private IPs (only if you control the URLs)

允许私有IP（仅当你能控制目标URL时使用）

SURF_ALLOW_PRIVATE=true npx google-surf-mcp


**Low extraction quality:**

```typescript
const result = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/article"
});

if (result.extraction_quality === "low") {
  // HTML was poorly structured or blocked
  // Try fetching directly via other means
}

SURF_ALLOW_PRIVATE=true npx google-surf-mcp


**提取质量低：**

```typescript
const result = await use_mcp_tool("google-surf", "extract", {
  url: "https://example.com/article"
});

if (result.extraction_quality === "low") {
  // HTML结构不佳或被阻止
  // 尝试通过其他方式直接抓取
}

Performance Issues

性能问题

Slow first call:

Normal. First call bootstraps the profile (~4s sequential, ~9s parallel). Subsequent calls are faster (~1.5s).

Idle timeout too aggressive:

bash

undefined

首次调用缓慢：

属于正常现象。首次调用会预热配置文件（串行约4秒，并行约9秒）。后续调用速度更快（约1.5秒）。

空闲超时过于激进：

bash

undefined

Keep contexts warm longer

延长上下文保持时间

SURF_IDLE_CLOSE_MS=120000 npx google-surf-mcp


**Too many parallel queries:**

Limit to 10 per `search_parallel` call. For more, batch them:

```typescript
const queries = [...100queries];
const batches = chunk(queries, 10);

for (const batch of batches) {
  const results = await use_mcp_tool("google-surf", "search_parallel", {
    queries: batch
  });
  // Process batch
  await sleep(5000); // Respect rate limits
}

SURF_IDLE_CLOSE_MS=120000 npx google-surf-mcp


**并行查询过多：**

`search_parallel`调用最多限制10个关键词。如需更多，分批处理：

```typescript
const queries = [...100queries];
const batches = chunk(queries, 10);

for (const batch of batches) {
  const results = await use_mcp_tool("google-surf", "search_parallel", {
    queries: batch
  });
  // 处理批次结果
  await sleep(5000); // 遵守速率限制
}

Academic Sources Supported

支持的学术来源

Inline PDF extraction for:

arXiv
bioRxiv, medRxiv
Nature, Science, Cell
OpenReview
NeurIPS, ICML, ICLR proceedings
JMLR, PMLR
Springer
PubMed (via PMC)
ACL Anthology

All extracted to markdown-formatted text.

支持直接提取以下平台的PDF内容：

arXiv
bioRxiv, medRxiv
Nature, Science, Cell
OpenReview
NeurIPS, ICML, ICLR会议论文集
JMLR, PMLR
Springer
PubMed（通过PMC）
ACL Anthology

所有内容均提取为markdown格式文本。

Cache Management

缓存管理

bash

undefined

bash

undefined

Disable search caching

禁用搜索缓存

SURF_CACHE_TTL_SEARCH_MS=0

Increase cache size

增大缓存容量

SURF_CACHE_MAX_ENTRIES=5000

Custom cache location

自定义缓存路径

SURF_CACHE_ROOT=/tmp/google-surf-cache


Cache namespaces:
- `search`: Google search results (24h TTL default)
- `extract`: URL content extractions (no TTL, LRU only)

SURF_CACHE_ROOT=/tmp/google-surf-cache


缓存命名空间：
- `search`：谷歌搜索结果（默认24小时过期）
- `extract`：URL内容提取结果（无过期时间，仅受LRU限制）

Rate Limiting

速率限制

Built-in rate limiter prevents Google blocks:

bash

undefined

内置速率限制器防止被谷歌封禁：

bash

undefined

Default: 10 requests/minute

默认：每分钟10次请求

SURF_RATE_LIMIT_PER_MIN=10

Conservative for shared IPs

共享IP环境下的保守设置

SURF_RATE_LIMIT_PER_MIN=5

Aggressive (may trigger CAPTCHAs)

激进设置（可能触发CAPTCHA）

SURF_RATE_LIMIT_PER_MIN=20


Check current usage:

```typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.rate_limiter);
// { requests_per_minute: 7, limit: 10, window_start: "2026-05-17T..." }

SURF_RATE_LIMIT_PER_MIN=20


查看当前使用情况：

```typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.rate_limiter);
// { requests_per_minute: 7, limit: 10, window_start: "2026-05-17T..." }

Best Practices

最佳实践

Use abstract mode for triage: Default
```
search_extract
```
to
```
mode="abstract"
```
to save tokens and time. Only request
```
mode="full"
```
when needed.
Batch related queries: Use
```
search_parallel
```
instead of sequential
```
search
```
calls.
Check health before batch ops: Prevents hitting rate limits mid-batch.
Respect cache TTLs: Default 24h for search is sensible. Don't disable unless debugging.
Handle extraction failures gracefully: Always check
```
extraction_quality
```
and handle
```
{ error }
```
responses.
Profile warmth: First call of the day may be slower. Acceptable for human-in-the-loop workflows.
CAPTCHA strategy: For long-running agents, use
```
SURF_CLOUD_MODE=false
```
and solve CAPTCHAs as they appear to preserve profile reputation.

使用摘要模式快速筛选：默认将
```
search_extract
```
设为
```
mode="abstract"
```
以节省token和时间。仅在需要时使用
```
mode="full"
```
。
批量处理相关关键词：使用
```
search_parallel
```
替代串行
```
search
```
调用。
批量操作前检查健康状况：避免在批量操作中途触发速率限制。
遵守缓存过期时间：默认24小时的搜索缓存设置较为合理。除非调试，否则不要禁用。
优雅处理提取失败：始终检查
```
extraction_quality
```
并处理
```
{ error }
```
响应。
预热配置文件：每日首次调用可能较慢，适合有人工参与的工作流。
CAPTCHA策略：对于长期运行的Agent，使用
```
SURF_CLOUD_MODE=false
```
并在出现CAPTCHA时及时验证，以维护配置文件信誉。

google-surf-mcp-search

Original

Translation

google-surf-mcp-search

google-surf-mcp-search

What It Does

功能介绍

Installation

安装方法

Quick Install (npx)

快速安装（npx方式）

Local Clone Installation

本地克隆安装

Manual Bootstrap (if auto-bootstrap fails)

手动引导（自动引导失败时使用）

Available Tools

可用工具

1. search - Single Google Search

1. search - 单次谷歌搜索

2. search_parallel - Parallel Multi-Query Search

2. search_parallel - 多关键词并行搜索

3. extract - Fetch and Extract Content

3. extract - 内容抓取与提取

4. search_extract - Combined Search + Extract

4. search_extract - 搜索+提取组合操作

5. health - Server Status

5. health - 服务器状态检查

Configuration

配置说明

Essential Variables

核心变量

Chrome binary path (auto-detected if not set)

Chrome二进制文件路径（未设置时自动检测）

Profile storage (default: ~/.google-surf-mcp)

配置文件存储路径（默认：~/.google-surf-mcp）

Browser locale and timezone

浏览器区域与时区

Headless & CAPTCHA Recovery

无头模式与CAPTCHA恢复

Run Chrome visibly (for demos/debugging)

以可视化模式运行Chrome（用于演示/调试）

Remote debugging mode (headless servers)

远程调试模式（适用于无头服务器）

Cloud/serverless mode (fail-fast on CAPTCHA)

云/无服务器模式（遇到CAPTCHA时快速失败）

Performance Tuning

性能调优

Idle close timeout (ms), 0 disables

空闲关闭超时时间（毫秒），设为0则禁用

Rate limit (requests per minute)

速率限制（每分钟请求数）

Search cache TTL (ms), 0 disables

搜索缓存过期时间（毫秒），设为0则禁用

Cache LRU size

缓存LRU容量

Security

安全设置

Allow private IPs in extract (default: false)

允许提取私有IP内容（默认：false）

Ignore TLS errors (auto-on in cloud mode)

忽略TLS错误（云模式下自动启用）

Disable sandbox (auto-on in cloud mode)

禁用沙箱（云模式下自动启用）

Advanced

高级设置

Disable cascade fallback (pin single mode)

禁用级联回退（固定使用单一模式）

Humanlike browsing (off | background | inline)

类人浏览模式（off | background | inline）

Common Patterns

常见使用模式

Pattern 1: Research Assistant

模式1：研究助手

Pattern 2: Parallel Research

模式2：并行研究

Pattern 3: Content Aggregation

模式3：内容聚合

Pattern 4: Health Check Before Heavy Operations

模式4：大型操作前的健康检查

CAPTCHA Recovery Modes

1.
`search`
- Single Google Search

1.
`search`
- 单次谷歌搜索

2.
`search_parallel`
- Parallel Multi-Query Search

2.
`search_parallel`
- 多关键词并行搜索

3.
`extract`
- Fetch and Extract Content

3.
`extract`
- 内容抓取与提取

4.
`search_extract`
- Combined Search + Extract

4.
`search_extract`
- 搜索+提取组合操作

5.
`health`
- Server Status

5.
`health`
- 服务器状态检查