google-surf-mcp-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesegoogle-surf-mcp-search
google-surf-mcp-search
What It Does
功能介绍
google-surf-mcp is an MCP server that provides Google search functionality without requiring API keys. It combines three capabilities in one:
- Google search with ad/spam filtering
- URL content extraction (HTML + PDF)
- Academic paper extraction (arXiv, Nature, PubMed, etc.)
Key features:
- Works with actual Google search (not an API wrapper)
- Automatic CAPTCHA recovery with persistent browser profiles
- Parallel search and extraction
- Token-efficient abstract mode for triage
- Built-in rate limiting and caching
- Geometric verification to drop sponsored ads and knowledge panels
google-surf-mcp是一款无需API密钥即可提供谷歌搜索功能的MCP服务器,集成了三大核心能力:
- 带广告/垃圾信息过滤的谷歌搜索
- URL内容提取(HTML + PDF格式)
- 学术论文提取(支持arXiv、Nature、PubMed等平台)
核心特性:
- 基于真实谷歌搜索实现(非API封装)
- 借助持久化浏览器配置文件自动完成CAPTCHA恢复
- 支持并行搜索与内容提取
- 适用于快速筛选的高效摘要模式
- 内置速率限制与缓存机制
- 通过几何验证过滤赞助广告和知识面板
Installation
安装方法
Quick Install (npx)
快速安装(npx方式)
Add to your MCP client config (e.g., for Claude Code):
~/.claude.jsonjson
{
"mcpServers": {
"google-surf": {
"command": "npx",
"args": ["-y", "google-surf-mcp"]
}
}
}将以下配置添加到你的MCP客户端配置文件中(例如Claude Code的):
~/.claude.jsonjson
{
"mcpServers": {
"google-surf": {
"command": "npx",
"args": ["-y", "google-surf-mcp"]
}
}
}Local Clone Installation
本地克隆安装
bash
git clone https://github.com/HarimxChoi/google-surf-mcp
cd google-surf-mcp
npm install
npm run buildConfig for local installation:
json
{
"mcpServers": {
"google-surf": {
"command": "node",
"args": ["/absolute/path/to/google-surf-mcp/build/index.js"]
}
}
}bash
git clone https://github.com/HarimxChoi/google-surf-mcp
cd google-surf-mcp
npm install
npm run build本地安装对应的配置:
json
{
"mcpServers": {
"google-surf": {
"command": "node",
"args": ["/absolute/path/to/google-surf-mcp/build/index.js"]
}
}
}Manual Bootstrap (if auto-bootstrap fails)
手动引导(自动引导失败时使用)
bash
npm run bootstrapWith custom paths:
bash
CHROME_PATH=/usr/bin/google-chrome SURF_TZ=America/New_York npm run bootstrapbash
npm run bootstrap自定义路径引导:
bash
CHROME_PATH=/usr/bin/google-chrome SURF_TZ=America/New_York npm run bootstrapAvailable Tools
可用工具
1. search
- Single Google Search
search1. search
- 单次谷歌搜索
searchPerforms a single Google search, returns filtered results (ads removed).
Parameters:
- (string, required): Search query
query - (number, optional): Max results, default 10
limit
Returns:
- : Array of
results[]{ title, url, snippet } - : Count of filtered results (ads, knowledge panels)
dropped - : Why items were dropped
dropped_reasons[] - : Boolean indicating cache use
cache_hit
Example Usage:
typescript
// Via MCP tool call
{
"query": "typescript async patterns",
"limit": 5
}Response:
json
{
"results": [
{
"title": "Async/Await in TypeScript",
"url": "https://example.com/typescript-async",
"snippet": "Learn how to use async/await patterns..."
}
],
"dropped": 2,
"dropped_reasons": ["sponsored", "knowledge_panel"],
"cache_hit": false
}执行单次谷歌搜索,返回过滤后的结果(已移除广告)。
参数:
- (字符串,必填):搜索关键词
query - (数字,可选):最大结果数,默认值为10
limit
返回值:
- :包含
results[]的结果数组{ title, url, snippet } - :被过滤的结果数量(广告、知识面板)
dropped - :结果被过滤的原因
dropped_reasons[] - :是否命中缓存的布尔值
cache_hit
使用示例:
typescript
// 通过MCP工具调用
{
"query": "typescript async patterns",
"limit": 5
}响应示例:
json
{
"results": [
{
"title": "Async/Await in TypeScript",
"url": "https://example.com/typescript-async",
"snippet": "Learn how to use async/await patterns..."
}
],
"dropped": 2,
"dropped_reasons": ["sponsored", "knowledge_panel"],
"cache_hit": false
}2. search_parallel
- Parallel Multi-Query Search
search_parallel2. search_parallel
- 多关键词并行搜索
search_parallelExecute multiple searches in parallel using a worker pool (max 10 queries).
Parameters:
- (string[], required): Array of search queries
queries - (number, optional): Max results per query, default 10
limit
Returns:
- Array of search results (same format as )
search
Example Usage:
typescript
{
"queries": [
"mcp server best practices",
"playwright stealth techniques",
"typescript pdf extraction",
"google search scraping 2026"
],
"limit": 3
}通过工作池并行执行多个搜索(最多支持10个关键词)。
参数:
- (字符串数组,必填):搜索关键词数组
queries - (数字,可选):每个关键词的最大结果数,默认值为10
limit
返回值:
- 搜索结果数组(格式与工具一致)
search
使用示例:
typescript
{
"queries": [
"mcp server best practices",
"playwright stealth techniques",
"typescript pdf extraction",
"google search scraping 2026"
],
"limit": 3
}3. extract
- Fetch and Extract Content
extract3. extract
- 内容抓取与提取
extractExtract text content from a URL (HTML or PDF).
Parameters:
- (string, required): URL to extract
url - (number, optional): Character limit, default 100k
max_chars - (string, optional):
mode|"full"|"abstract""metadata"
Modes:
- : Complete article text (HTML via Readability, PDF via unpdf)
full - : ~1500 chars for triage (PDF page 1 or HTML meta description)
abstract - : PDF page count only
metadata
Returns:
- : Extracted text (markdown for HTML)
content - : Document title
title - : Short summary
excerpt - : Character count
length - : Boolean
is_pdf - : Number (PDFs only)
page_count - :
extraction_quality|"high"|"medium""low"
Example Usage:
typescript
// Extract full academic paper
{
"url": "https://arxiv.org/pdf/2301.12345.pdf",
"mode": "full"
}
// Quick abstract for triage
{
"url": "https://nature.com/articles/s41586-023-12345-6",
"mode": "abstract",
"max_chars": 2000
}Response:
json
{
"content": "# Paper Title\n\nAbstract: This paper presents...",
"title": "Novel Approach to AI Safety",
"excerpt": "This paper presents a novel approach...",
"length": 45678,
"is_pdf": true,
"page_count": 12,
"extraction_quality": "high"
}从URL中提取文本内容(支持HTML或PDF格式)。
参数:
- (字符串,必填):待提取的URL
url - (数字,可选):字符限制,默认值为100000
max_chars - (字符串,可选):
mode|"full"|"abstract""metadata"
模式说明:
- :提取完整文章文本(HTML通过Readability处理,PDF通过unpdf处理)
full - :提取约1500字符的摘要用于快速筛选(PDF取第一页或HTML元描述)
abstract - :仅提取PDF页数信息
metadata
返回值:
- :提取的文本(HTML内容转为markdown格式)
content - :文档标题
title - :简短摘要
excerpt - :字符数
length - :是否为PDF格式的布尔值
is_pdf - :页数(仅PDF返回)
page_count - :
extraction_quality|"high"|"medium""low"
使用示例:
typescript
// 提取完整学术论文
{
"url": "https://arxiv.org/pdf/2301.12345.pdf",
"mode": "full"
}
// 快速提取摘要用于筛选
{
"url": "https://nature.com/articles/s41586-023-12345-6",
"mode": "abstract",
"max_chars": 2000
}响应示例:
json
{
"content": "# Paper Title\n\nAbstract: This paper presents...",
"title": "Novel Approach to AI Safety",
"excerpt": "This paper presents a novel approach...",
"length": 45678,
"is_pdf": true,
"page_count": 12,
"extraction_quality": "high"
}4. search_extract
- Combined Search + Extract
search_extract4. search_extract
- 搜索+提取组合操作
search_extractSearch and extract content in one call. Efficiently parallelizes extraction.
Parameters:
- (string, required): Search query
query - (number, optional): Max results to extract, default 5
limit - (number, optional): Per-result char limit
max_chars - (string, optional):
mode(default) |"abstract""full"
Best Practices:
- Use (default) for cheap triage with ~1500-char summaries
mode="abstract" - Use only when you need complete article text (slower, more tokens)
mode="full"
Returns:
- : Search results enriched with
results[]extracted_content
Example Usage:
typescript
// Triage mode (default, token-efficient)
{
"query": "claude mcp server tutorials",
"limit": 5,
"mode": "abstract"
}
// Full extraction (when you need complete content)
{
"query": "machine learning interpretability survey",
"limit": 3,
"mode": "full",
"max_chars": 50000
}Response:
json
{
"results": [
{
"title": "Building MCP Servers",
"url": "https://example.com/mcp-tutorial",
"snippet": "Complete guide to MCP servers...",
"extracted_content": {
"content": "# Building MCP Servers\n\nMCP (Model Context Protocol)...",
"title": "Building MCP Servers",
"length": 1523,
"is_pdf": false,
"extraction_quality": "high"
}
}
]
}在一次调用中完成搜索与内容提取,高效并行处理提取任务。
参数:
- (字符串,必填):搜索关键词
query - (数字,可选):待提取的最大结果数,默认值为5
limit - (数字,可选):每个结果的字符限制
max_chars - (字符串,可选):
mode(默认) |"abstract""full"
最佳实践:
- 默认使用模式,通过约1500-字符的摘要实现低成本快速筛选
mode="abstract" - 仅在需要完整文章文本时使用模式(速度较慢,消耗更多token)
mode="full"
返回值:
- :包含
results[]字段的增强版搜索结果extracted_content
使用示例:
typescript
// 快速筛选模式(默认,token高效)
{
"query": "claude mcp server tutorials",
"limit": 5,
"mode": "abstract"
}
// 完整提取模式(需要完整内容时使用)
{
"query": "machine learning interpretability survey",
"limit": 3,
"mode": "full",
"max_chars": 50000
}响应示例:
json
{
"results": [
{
"title": "Building MCP Servers",
"url": "https://example.com/mcp-tutorial",
"snippet": "Complete guide to MCP servers...",
"extracted_content": {
"content": "# Building MCP Servers\n\nMCP (Model Context Protocol)...",
"title": "Building MCP Servers",
"length": 1523,
"is_pdf": false,
"extraction_quality": "high"
}
}
]
}5. health
- Server Status
health5. health
- 服务器状态检查
healthCheck server health and configuration.
Returns:
- :
status|"healthy""degraded" - : Current stealth mode
cascade_mode - : Request counts and limits
rate_limiter - : Cache size and hit rates
cache_stats - : Active configuration values
config
Example Usage:
typescript
// No parameters
{}检查服务器健康状况与配置信息。
返回值:
- :
status|"healthy""degraded" - :当前隐身模式
cascade_mode - :请求计数与限制信息
rate_limiter - :缓存大小与命中率
cache_stats - :当前生效的配置值
config
使用示例:
typescript
// 无需参数
{}Configuration
配置说明
All configuration via environment variables:
所有配置通过环境变量设置:
Essential Variables
核心变量
bash
undefinedbash
undefinedChrome binary path (auto-detected if not set)
Chrome二进制文件路径(未设置时自动检测)
CHROME_PATH=/usr/bin/google-chrome
CHROME_PATH=/usr/bin/google-chrome
Profile storage (default: ~/.google-surf-mcp)
配置文件存储路径(默认:~/.google-surf-mcp)
SURF_PROFILE_ROOT=/custom/path/profiles
SURF_PROFILE_ROOT=/custom/path/profiles
Browser locale and timezone
浏览器区域与时区
SURF_LOCALE=en-US
SURF_TZ=America/New_York
undefinedSURF_LOCALE=en-US
SURF_TZ=America/New_York
undefinedHeadless & CAPTCHA Recovery
无头模式与CAPTCHA恢复
bash
undefinedbash
undefinedRun Chrome visibly (for demos/debugging)
以可视化模式运行Chrome(用于演示/调试)
SURF_HEADLESS=false
SURF_HEADLESS=false
Remote debugging mode (headless servers)
远程调试模式(适用于无头服务器)
SURF_REMOTE_DEBUG=true
SURF_REMOTE_DEBUG=true
Cloud/serverless mode (fail-fast on CAPTCHA)
云/无服务器模式(遇到CAPTCHA时快速失败)
SURF_CLOUD_MODE=true
undefinedSURF_CLOUD_MODE=true
undefinedPerformance Tuning
性能调优
bash
undefinedbash
undefinedIdle close timeout (ms), 0 disables
空闲关闭超时时间(毫秒),设为0则禁用
SURF_IDLE_CLOSE_MS=30000
SURF_IDLE_CLOSE_MS=30000
Rate limit (requests per minute)
速率限制(每分钟请求数)
SURF_RATE_LIMIT_PER_MIN=10
SURF_RATE_LIMIT_PER_MIN=10
Search cache TTL (ms), 0 disables
搜索缓存过期时间(毫秒),设为0则禁用
SURF_CACHE_TTL_SEARCH_MS=86400000
SURF_CACHE_TTL_SEARCH_MS=86400000
Cache LRU size
缓存LRU容量
SURF_CACHE_MAX_ENTRIES=1000
undefinedSURF_CACHE_MAX_ENTRIES=1000
undefinedSecurity
安全设置
bash
undefinedbash
undefinedAllow private IPs in extract (default: false)
允许提取私有IP内容(默认:false)
SURF_ALLOW_PRIVATE=true
SURF_ALLOW_PRIVATE=true
Ignore TLS errors (auto-on in cloud mode)
忽略TLS错误(云模式下自动启用)
SURF_INSECURE_TLS=false
SURF_INSECURE_TLS=false
Disable sandbox (auto-on in cloud mode)
禁用沙箱(云模式下自动启用)
SURF_NO_SANDBOX=false
undefinedSURF_NO_SANDBOX=false
undefinedAdvanced
高级设置
bash
undefinedbash
undefinedDisable cascade fallback (pin single mode)
禁用级联回退(固定使用单一模式)
SURF_CASCADE_DISABLED=true
SURF_USE_STEALTH=true
SURF_CASCADE_DISABLED=true
SURF_USE_STEALTH=true
Humanlike browsing (off | background | inline)
类人浏览模式(off | background | inline)
SURF_HUMANLIKE_MODE=background
undefinedSURF_HUMANLIKE_MODE=background
undefinedCommon Patterns
常见使用模式
Pattern 1: Research Assistant
模式1:研究助手
Search academic papers and extract abstracts for quick review:
typescript
// Step 1: Search and triage with abstracts
const triage = await use_mcp_tool("google-surf", "search_extract", {
query: "transformer architecture improvements 2026",
limit: 10,
mode: "abstract"
});
// Step 2: Extract full text for promising papers
const topPapers = triage.results.slice(0, 3);
const fullTexts = await Promise.all(
topPapers.map(paper =>
use_mcp_tool("google-surf", "extract", {
url: paper.url,
mode: "full",
max_chars: 100000
})
)
);搜索学术论文并提取摘要用于快速审阅:
typescript
// 步骤1:搜索并通过摘要快速筛选
const triage = await use_mcp_tool("google-surf", "search_extract", {
query: "transformer architecture improvements 2026",
limit: 10,
mode: "abstract"
});
// 步骤2:提取有潜力论文的完整文本
const topPapers = triage.results.slice(0, 3);
const fullTexts = await Promise.all(
topPapers.map(paper =>
use_mcp_tool("google-surf", "extract", {
url: paper.url,
mode: "full",
max_chars: 100000
})
)
);Pattern 2: Parallel Research
模式2:并行研究
Search multiple related topics simultaneously:
typescript
const relatedTopics = await use_mcp_tool("google-surf", "search_parallel", {
queries: [
"MCP server authentication patterns",
"MCP server error handling",
"MCP server rate limiting",
"MCP server caching strategies"
],
limit: 5
});
// Process results by topic
relatedTopics.forEach((topicResults, index) => {
console.log(`Topic ${index + 1}:`, topicResults.results.length, "results");
});同时搜索多个相关主题:
typescript
const relatedTopics = await use_mcp_tool("google-surf", "search_parallel", {
queries: [
"MCP server authentication patterns",
"MCP server error handling",
"MCP server rate limiting",
"MCP server caching strategies"
],
limit: 5
});
// 按主题处理结果
relatedTopics.forEach((topicResults, index) => {
console.log(`主题 ${index + 1}:`, topicResults.results.length, "条结果");
});Pattern 3: Content Aggregation
模式3:内容聚合
Build a comprehensive knowledge base:
typescript
// 1. Find relevant sources
const sources = await use_mcp_tool("google-surf", "search", {
query: "typescript best practices 2026",
limit: 20
});
// 2. Extract abstracts to filter quality
const abstracts = await Promise.all(
sources.results.map(result =>
use_mcp_tool("google-surf", "extract", {
url: result.url,
mode: "abstract"
})
)
);
// 3. Full extraction for high-quality sources
const highQuality = abstracts
.filter(a => a.extraction_quality === "high")
.slice(0, 5);
const fullContent = await Promise.all(
highQuality.map(a =>
use_mcp_tool("google-surf", "extract", {
url: a.url,
mode: "full"
})
)
);构建综合性知识库:
typescript
// 1. 查找相关来源
const sources = await use_mcp_tool("google-surf", "search", {
query: "typescript best practices 2026",
limit: 20
});
// 2. 提取摘要筛选优质内容
const abstracts = await Promise.all(
sources.results.map(result =>
use_mcp_tool("google-surf", "extract", {
url: result.url,
mode: "abstract"
})
)
);
// 3. 提取优质来源的完整内容
const highQuality = abstracts
.filter(a => a.extraction_quality === "high")
.slice(0, 5);
const fullContent = await Promise.all(
highQuality.map(a =>
use_mcp_tool("google-surf", "extract", {
url: a.url,
mode: "full"
})
)
);Pattern 4: Health Check Before Heavy Operations
模式4:大型操作前的健康检查
typescript
// Check server health before batch operations
const health = await use_mcp_tool("google-surf", "health", {});
if (health.status !== "healthy") {
console.warn("Server degraded, reducing concurrency");
}
const rateLimit = health.rate_limiter.requests_per_minute;
if (rateLimit > 8) {
// Wait before starting batch
await sleep(60000);
}typescript
// 批量操作前检查服务器健康状况
const health = await use_mcp_tool("google-surf", "health", {});
if (health.status !== "healthy") {
console.warn("服务器状态不佳,降低并发量");
}
const rateLimit = health.rate_limiter.requests_per_minute;
if (rateLimit > 8) {
// 开始批量操作前等待
await sleep(60000);
}CAPTCHA Recovery Modes
CAPTCHA恢复模式
The server handles CAPTCHAs automatically based on environment:
服务器会根据运行环境自动处理CAPTCHA:
Mode 1: Local Desktop (default)
模式1:本地桌面(默认)
bash
undefinedbash
undefinedNo config needed - default behavior
无需额外配置 - 默认行为
When CAPTCHA appears:
1. OS notification fires
2. Headed Chrome window opens
3. Human solves CAPTCHA
4. Call automatically retries
5. Profile reputation preserved
遇到CAPTCHA时:
1. 触发系统通知
2. 打开可视化Chrome窗口
3. 人工完成CAPTCHA验证
4. 自动重试请求
5. 保留配置文件信誉Mode 2: Visible Chrome (demos/debugging)
模式2:可视化Chrome(演示/调试)
bash
SURF_HEADLESS=false- Chrome runs visibly at all times
- CAPTCHA recovery skips notification (user is watching)
- Good for demos and debugging
bash
SURF_HEADLESS=false- Chrome始终以可视化模式运行
- CAPTCHA恢复跳过通知(用户实时监控)
- 适用于演示与调试场景
Mode 3: Remote Debugging (headless servers)
模式3:远程调试(无头服务器)
bash
SURF_HEADLESS=true
SURF_REMOTE_DEBUG=trueWhen CAPTCHA appears:
- DevTools port printed to logs
- Error thrown with instructions
- SSH port-forward from local machine
- Open locally
chrome://inspect - Solve CAPTCHA remotely
- Retry the call
Example SSH forward:
bash
ssh -L 9222:localhost:9222 your-serverbash
SURF_HEADLESS=true
SURF_REMOTE_DEBUG=true遇到CAPTCHA时:
- 日志中打印DevTools端口
- 抛出包含操作指引的错误
- 从本地机器通过SSH端口转发
- 在本地打开
chrome://inspect - 远程完成CAPTCHA验证
- 重试请求
SSH转发示例:
bash
ssh -L 9222:localhost:9222 your-serverMode 4: Cloud/Serverless (fail-fast)
模式4:云/无服务器(快速失败)
bash
SURF_CLOUD_MODE=true- No CAPTCHA recovery
- Throws error immediately
CAPTCHA_REQUIRED - Worker pool disabled
- Sandbox disabled, TLS bypass enabled
bash
SURF_CLOUD_MODE=true- 不支持CAPTCHA恢复
- 立即抛出错误
CAPTCHA_REQUIRED - 禁用工作池
- 自动禁用沙箱并启用TLS绕过
Troubleshooting
故障排除
Chrome Not Found
Chrome未找到
Error:
Chrome binary not foundSolution:
bash
undefined错误信息:
Chrome binary not found解决方案:
bash
undefinedFind your Chrome installation
查找Chrome安装路径
which google-chrome
which chromium
which google-chrome
which chromium
Set explicitly
显式设置路径
CHROME_PATH=/usr/bin/google-chrome npm run bootstrap
undefinedCHROME_PATH=/usr/bin/google-chrome npm run bootstrap
undefinedCAPTCHA Loops
CAPTCHA循环
Symptoms: Repeated CAPTCHA requests
Solutions:
- Run bootstrap to warm the profile:
bash
npm run bootstrap- Reduce request rate:
bash
SURF_RATE_LIMIT_PER_MIN=5 npx google-surf-mcp- Check cascade mode:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.cascade_mode); // Should cycle: none → stealth → humanlike症状: 反复出现CAPTCHA请求
解决方案:
- 运行引导命令预热配置文件:
bash
npm run bootstrap- 降低请求速率:
bash
SURF_RATE_LIMIT_PER_MIN=5 npx google-surf-mcp- 检查级联模式:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.cascade_mode); // 应循环切换:none → stealth → humanlikeEmpty or No Results
无结果或结果为空
Check health first:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
// Check rate_limiter.requests_per_minute
// Check cache_stats for anomaliesClear cache if stale:
bash
SURF_CACHE_TTL_SEARCH_MS=0 npx google-surf-mcpCheck dropped reasons:
typescript
const results = await use_mcp_tool("google-surf", "search", {
query: "test query"
});
console.log(results.dropped_reasons);
// If all results dropped as "sponsored", selector may be stale先检查健康状况:
typescript
const health = await use_mcp_tool("google-surf", "health", {});
// 检查rate_limiter.requests_per_minute
// 检查cache_stats是否存在异常清理过期缓存:
bash
SURF_CACHE_TTL_SEARCH_MS=0 npx google-surf-mcp检查过滤原因:
typescript
const results = await use_mcp_tool("google-surf", "search", {
query: "test query"
});
console.log(results.dropped_reasons);
// 如果所有结果都因"sponsored"被过滤,说明选择器可能已失效Extraction Failures
提取失败
PDF extraction fails:
typescript
// Try metadata mode first
const meta = await use_mcp_tool("google-surf", "extract", {
url: "https://example.com/paper.pdf",
mode: "metadata"
});
console.log(meta.page_count); // If 0, PDF is inaccessibleSSRF blocked:
bash
undefinedPDF提取失败:
typescript
// 先尝试元数据模式
const meta = await use_mcp_tool("google-surf", "extract", {
url: "https://example.com/paper.pdf",
mode: "metadata"
});
console.log(meta.page_count); // 如果为0,说明PDF无法访问SSRF被阻止:
bash
undefinedAllow private IPs (only if you control the URLs)
允许私有IP(仅当你能控制目标URL时使用)
SURF_ALLOW_PRIVATE=true npx google-surf-mcp
**Low extraction quality:**
```typescript
const result = await use_mcp_tool("google-surf", "extract", {
url: "https://example.com/article"
});
if (result.extraction_quality === "low") {
// HTML was poorly structured or blocked
// Try fetching directly via other means
}SURF_ALLOW_PRIVATE=true npx google-surf-mcp
**提取质量低:**
```typescript
const result = await use_mcp_tool("google-surf", "extract", {
url: "https://example.com/article"
});
if (result.extraction_quality === "low") {
// HTML结构不佳或被阻止
// 尝试通过其他方式直接抓取
}Performance Issues
性能问题
Slow first call:
Normal. First call bootstraps the profile (~4s sequential, ~9s parallel). Subsequent calls are faster (~1.5s).
Idle timeout too aggressive:
bash
undefined首次调用缓慢:
属于正常现象。首次调用会预热配置文件(串行约4秒,并行约9秒)。后续调用速度更快(约1.5秒)。
空闲超时过于激进:
bash
undefinedKeep contexts warm longer
延长上下文保持时间
SURF_IDLE_CLOSE_MS=120000 npx google-surf-mcp
**Too many parallel queries:**
Limit to 10 per `search_parallel` call. For more, batch them:
```typescript
const queries = [...100queries];
const batches = chunk(queries, 10);
for (const batch of batches) {
const results = await use_mcp_tool("google-surf", "search_parallel", {
queries: batch
});
// Process batch
await sleep(5000); // Respect rate limits
}SURF_IDLE_CLOSE_MS=120000 npx google-surf-mcp
**并行查询过多:**
`search_parallel`调用最多限制10个关键词。如需更多,分批处理:
```typescript
const queries = [...100queries];
const batches = chunk(queries, 10);
for (const batch of batches) {
const results = await use_mcp_tool("google-surf", "search_parallel", {
queries: batch
});
// 处理批次结果
await sleep(5000); // 遵守速率限制
}Academic Sources Supported
支持的学术来源
Inline PDF extraction for:
- arXiv
- bioRxiv, medRxiv
- Nature, Science, Cell
- OpenReview
- NeurIPS, ICML, ICLR proceedings
- JMLR, PMLR
- Springer
- PubMed (via PMC)
- ACL Anthology
All extracted to markdown-formatted text.
支持直接提取以下平台的PDF内容:
- arXiv
- bioRxiv, medRxiv
- Nature, Science, Cell
- OpenReview
- NeurIPS, ICML, ICLR会议论文集
- JMLR, PMLR
- Springer
- PubMed(通过PMC)
- ACL Anthology
所有内容均提取为markdown格式文本。
Cache Management
缓存管理
bash
undefinedbash
undefinedDisable search caching
禁用搜索缓存
SURF_CACHE_TTL_SEARCH_MS=0
SURF_CACHE_TTL_SEARCH_MS=0
Increase cache size
增大缓存容量
SURF_CACHE_MAX_ENTRIES=5000
SURF_CACHE_MAX_ENTRIES=5000
Custom cache location
自定义缓存路径
SURF_CACHE_ROOT=/tmp/google-surf-cache
Cache namespaces:
- `search`: Google search results (24h TTL default)
- `extract`: URL content extractions (no TTL, LRU only)SURF_CACHE_ROOT=/tmp/google-surf-cache
缓存命名空间:
- `search`:谷歌搜索结果(默认24小时过期)
- `extract`:URL内容提取结果(无过期时间,仅受LRU限制)Rate Limiting
速率限制
Built-in rate limiter prevents Google blocks:
bash
undefined内置速率限制器防止被谷歌封禁:
bash
undefinedDefault: 10 requests/minute
默认:每分钟10次请求
SURF_RATE_LIMIT_PER_MIN=10
SURF_RATE_LIMIT_PER_MIN=10
Conservative for shared IPs
共享IP环境下的保守设置
SURF_RATE_LIMIT_PER_MIN=5
SURF_RATE_LIMIT_PER_MIN=5
Aggressive (may trigger CAPTCHAs)
激进设置(可能触发CAPTCHA)
SURF_RATE_LIMIT_PER_MIN=20
Check current usage:
```typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.rate_limiter);
// { requests_per_minute: 7, limit: 10, window_start: "2026-05-17T..." }SURF_RATE_LIMIT_PER_MIN=20
查看当前使用情况:
```typescript
const health = await use_mcp_tool("google-surf", "health", {});
console.log(health.rate_limiter);
// { requests_per_minute: 7, limit: 10, window_start: "2026-05-17T..." }Best Practices
最佳实践
-
Use abstract mode for triage: Defaultto
search_extractto save tokens and time. Only requestmode="abstract"when needed.mode="full" -
Batch related queries: Useinstead of sequential
search_parallelcalls.search -
Check health before batch ops: Prevents hitting rate limits mid-batch.
-
Respect cache TTLs: Default 24h for search is sensible. Don't disable unless debugging.
-
Handle extraction failures gracefully: Always checkand handle
extraction_qualityresponses.{ error } -
Profile warmth: First call of the day may be slower. Acceptable for human-in-the-loop workflows.
-
CAPTCHA strategy: For long-running agents, useand solve CAPTCHAs as they appear to preserve profile reputation.
SURF_CLOUD_MODE=false
-
使用摘要模式快速筛选:默认将设为
search_extract以节省token和时间。仅在需要时使用mode="abstract"。mode="full" -
批量处理相关关键词:使用替代串行
search_parallel调用。search -
批量操作前检查健康状况:避免在批量操作中途触发速率限制。
-
遵守缓存过期时间:默认24小时的搜索缓存设置较为合理。除非调试,否则不要禁用。
-
优雅处理提取失败:始终检查并处理
extraction_quality响应。{ error } -
预热配置文件:每日首次调用可能较慢,适合有人工参与的工作流。
-
CAPTCHA策略:对于长期运行的Agent,使用并在出现CAPTCHA时及时验证,以维护配置文件信誉。
SURF_CLOUD_MODE=false