brightdata-web-mcp
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBright Data Web MCP
Bright Data Web MCP
Use this skill for reliable web access in MCP-compatible agents. Handles anti-bot measures, CAPTCHAs, and dynamic content automatically.
该技能可为兼容MCP的Agent提供可靠的网页访问能力,能自动处理反机器人措施、CAPTCHA验证以及动态内容。
Quick Start
快速开始
Search the web
网页搜索
Tool: search_engine
Input: { "query": "latest AI news", "engine": "google" }Returns JSON for Google, Markdown for Bing/Yandex. Use parameter for pagination.
cursorTool: search_engine
Input: { "query": "latest AI news", "engine": "google" }Google返回JSON格式结果,必应/雅虎返回Markdown格式结果。可使用参数实现分页。
cursorScrape a page to Markdown
将页面内容爬取为Markdown格式
Tool: scrape_as_markdown
Input: { "url": "https://example.com/article" }Tool: scrape_as_markdown
Input: { "url": "https://example.com/article" }Extract structured data (Pro/advanced_scraping)
提取结构化数据(专业版/高级爬取模式)
Tool: extract
Input: {
"url": "https://example.com/product",
"prompt": "Extract: name, price, description, availability"
}Tool: extract
Input: {
"url": "https://example.com/product",
"prompt": "Extract: name, price, description, availability"
}When to Use
适用场景
| Scenario | Tool | Mode |
|---|---|---|
| Web search results | | Rapid (Free) |
| Clean page content | | Rapid (Free) |
| Parallel searches (up to 10) | | Pro/advanced_scraping |
| Multiple URLs at once | | Pro/advanced_scraping |
| HTML structure needed | | Pro/advanced_scraping |
| AI JSON extraction | | Pro/advanced_scraping |
| Dynamic/JS-heavy sites | | Pro/browser |
| Amazon/LinkedIn/social data | | Pro |
| 场景 | 工具 | 模式 |
|---|---|---|
| 网页搜索结果 | | 快速版(免费) |
| 纯净页面内容获取 | | 快速版(免费) |
| 并行搜索(最多10个) | | 专业版/高级爬取 |
| 批量处理多个URL | | 专业版/高级爬取 |
| 需要HTML结构 | | 专业版/高级爬取 |
| AI驱动的JSON提取 | | 专业版/高级爬取 |
| 动态/重度依赖JS的网站 | | 专业版/浏览器模式 |
| 亚马逊/领英/社交媒体数据 | | 专业版 |
Setup
配置方法
Remote (recommended) - No installation required:
SSE Endpoint:
https://mcp.brightdata.com/sse?token=YOUR_API_TOKENStreamable HTTP Endpoint:
https://mcp.brightdata.com/mcp?token=YOUR_API_TOKENLocal:
bash
API_TOKEN=<token> npx @brightdata/mcp远程模式(推荐)- 无需安装:
SSE端点:
https://mcp.brightdata.com/sse?token=YOUR_API_TOKEN流式HTTP端点:
https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN本地模式:
bash
API_TOKEN=<token> npx @brightdata/mcpModes & Configuration
模式与配置
Rapid Mode (Free - Default)
快速版(免费 - 默认)
- 5,000 requests/month free
- Tools: ,
search_enginescrape_as_markdown
- 每月免费5000次请求
- 可用工具:、
search_enginescrape_as_markdown
Pro Mode
专业版
- All Rapid tools + 60+ advanced tools
- Remote: add to URL
&pro=1 - Local: set
PRO_MODE=true
- 包含所有快速版工具 + 60+高级工具
- 远程模式:在URL后添加
&pro=1 - 本地模式:设置
PRO_MODE=true
Tool Groups
工具组
Select specific tool bundles instead of all Pro tools:
- Remote:
&groups=ecommerce,social - Local:
GROUPS=ecommerce,social
| Group | Description | Featured Tools |
|---|---|---|
| Retail & marketplace data | |
| Social media insights | |
| Browser automation | |
| Company intelligence | |
| Financial data | |
| News & dev data | |
| App store data | |
| Travel information | |
| Batch & AI extraction | |
可选择特定的工具包,而非全部专业版工具:
- 远程模式:
&groups=ecommerce,social - 本地模式:
GROUPS=ecommerce,social
| 工具组 | 描述 | 特色工具 |
|---|---|---|
| 零售与电商平台数据 | |
| 社交媒体洞察 | |
| 浏览器自动化 | |
| 企业情报 | |
| 金融数据 | |
| 新闻与开发数据 | |
| 应用商店数据 | |
| 旅游信息 | |
| 批量与AI提取 | |
Custom Tools
自定义工具
Cherry-pick individual tools:
- Remote:
&tools=scrape_as_markdown,web_data_linkedin_person_profile - Local:
TOOLS=scrape_as_markdown,web_data_linkedin_person_profile
Note:orGROUPSoverrideTOOLSwhen specified.PRO_MODE
可单独挑选所需工具:
- 远程模式:
&tools=scrape_as_markdown,web_data_linkedin_person_profile - 本地模式:
TOOLS=scrape_as_markdown,web_data_linkedin_person_profile
注意:指定或GROUPS会覆盖TOOLS设置。PRO_MODE
Core Tools Reference
核心工具参考
Search & Scraping (Rapid Mode)
搜索与爬取(快速版)
- - Google/Bing/Yandex SERP results (JSON for Google, Markdown for others)
search_engine - - Clean Markdown from any URL with anti-bot bypass
scrape_as_markdown
- - 谷歌/必应/雅虎的搜索结果页面(Google返回JSON,其他返回Markdown)
search_engine - - 从任意URL获取纯净Markdown内容,自动绕过反机器人机制
scrape_as_markdown
Advanced Scraping (Pro/advanced_scraping)
高级爬取(专业版/高级爬取模式)
- - Up to 10 parallel searches
search_engine_batch - - Up to 10 URLs in one request
scrape_batch - - Full HTML response
scrape_as_html - - AI-powered JSON extraction with custom prompt
extract - - Monitor tool usage during session
session_stats
- - 最多10个并行搜索请求
search_engine_batch - - 单次请求处理最多10个URL
scrape_batch - - 返回完整HTML响应
scrape_as_html - - 基于AI的JSON提取,支持自定义提示词
extract - - 监控会话期间的工具使用情况
session_stats
Browser Automation (Pro/browser)
浏览器自动化(专业版/浏览器模式)
For JavaScript-rendered content or user interactions:
| Tool | Description |
|---|---|
| Open URL in browser session |
| Navigate back |
| Navigate forward |
| Get ARIA snapshot with element refs |
| Click element by ref |
| Type into input (optional submit) |
| Capture page image |
| Wait for element visibility |
| Scroll to bottom |
| Scroll element into view |
| Get page text content |
| Get full HTML |
| List network requests |
适用于JS渲染内容或用户交互场景:
| 工具 | 描述 |
|---|---|
| 在浏览器会话中打开URL |
| 后退页面 |
| 前进页面 |
| 获取带元素引用的ARIA快照 |
| 通过元素引用点击对应元素 |
| 在输入框中输入内容(可选提交) |
| 捕获页面截图 |
| 等待元素可见 |
| 滚动至页面底部 |
| 滚动至指定元素可见 |
| 获取页面文本内容 |
| 获取完整HTML内容 |
| 列出所有网络请求 |
Structured Data (Pro)
结构化数据(专业版)
Pre-built extractors for popular platforms:
E-commerce:
- ,
web_data_amazon_product,web_data_amazon_product_reviewsweb_data_amazon_product_search - ,
web_data_walmart_productweb_data_walmart_seller - ,
web_data_ebay_productweb_data_google_shopping - ,
web_data_homedepot_products,web_data_bestbuy_products,web_data_etsy_productsweb_data_zara_products
Social Media:
- ,
web_data_linkedin_person_profile,web_data_linkedin_company_profile,web_data_linkedin_job_listings,web_data_linkedin_postsweb_data_linkedin_people_search - ,
web_data_instagram_profiles,web_data_instagram_posts,web_data_instagram_reelsweb_data_instagram_comments - ,
web_data_facebook_posts,web_data_facebook_marketplace_listings,web_data_facebook_company_reviewsweb_data_facebook_events - ,
web_data_tiktok_profiles,web_data_tiktok_posts,web_data_tiktok_shopweb_data_tiktok_comments web_data_x_posts- ,
web_data_youtube_videos,web_data_youtube_profilesweb_data_youtube_comments web_data_reddit_posts
Business & Finance:
- ,
web_data_google_maps_reviews,web_data_crunchbase_companyweb_data_zoominfo_company_profile - ,
web_data_zillow_properties_listingweb_data_yahoo_finance_business
Other:
- ,
web_data_github_repository_fileweb_data_reuter_news - ,
web_data_google_play_storeweb_data_apple_app_store web_data_booking_hotel_listings
针对主流平台的预构建提取工具:
电商领域:
- ,
web_data_amazon_product,web_data_amazon_product_reviewsweb_data_amazon_product_search - ,
web_data_walmart_productweb_data_walmart_seller - ,
web_data_ebay_productweb_data_google_shopping - ,
web_data_homedepot_products,web_data_bestbuy_products,web_data_etsy_productsweb_data_zara_products
社交媒体:
- ,
web_data_linkedin_person_profile,web_data_linkedin_company_profile,web_data_linkedin_job_listings,web_data_linkedin_postsweb_data_linkedin_people_search - ,
web_data_instagram_profiles,web_data_instagram_posts,web_data_instagram_reelsweb_data_instagram_comments - ,
web_data_facebook_posts,web_data_facebook_marketplace_listings,web_data_facebook_company_reviewsweb_data_facebook_events - ,
web_data_tiktok_profiles,web_data_tiktok_posts,web_data_tiktok_shopweb_data_tiktok_comments web_data_x_posts- ,
web_data_youtube_videos,web_data_youtube_profilesweb_data_youtube_comments web_data_reddit_posts
商业与金融:
- ,
web_data_google_maps_reviews,web_data_crunchbase_companyweb_data_zoominfo_company_profile - ,
web_data_zillow_properties_listingweb_data_yahoo_finance_business
其他领域:
- ,
web_data_github_repository_fileweb_data_reuter_news - ,
web_data_google_play_storeweb_data_apple_app_store web_data_booking_hotel_listings
Workflow Patterns
工作流模式
Basic Research Flow
基础研究流程
- Search → to find relevant URLs
search_engine - Scrape → to get content
scrape_as_markdown - Extract → for structured JSON (if needed)
extract
- 搜索 → 使用找到相关URL
search_engine - 爬取 → 使用获取内容
scrape_as_markdown - 提取 → 如需结构化数据,使用生成JSON
extract
E-commerce Analysis
电商分析流程
- Use for structured product data
web_data_amazon_product - Use for review analysis
web_data_amazon_product_reviews - Flatten nested data for token-efficient processing
- 使用获取结构化产品数据
web_data_amazon_product - 使用获取评论用于分析
web_data_amazon_product_reviews - 扁平化嵌套数据以提升令牌处理效率
Social Media Monitoring
社交媒体监控流程
- Use platform-specific tools for structured extraction
web_data_* - For unsupported platforms, use +
scrape_as_markdownextract
- 使用平台专属的工具进行结构化提取
web_data_* - 对于不支持的平台,使用+
scrape_as_markdown组合extract
Dynamic Site Automation
动态网站自动化流程
- → open URL
scraping_browser_navigate - → get element refs
scraping_browser_snapshot - /
scraping_browser_click_ref→ interactscraping_browser_type_ref - → capture results
scraping_browser_screenshot
- → 打开目标URL
scraping_browser_navigate - → 获取元素引用
scraping_browser_snapshot - /
scraping_browser_click_ref→ 执行交互操作scraping_browser_type_ref - → 捕获操作结果
scraping_browser_screenshot
Environment Variables (Local)
本地环境变量
| Variable | Description | Default |
|---|---|---|
| Bright Data API token (required) | - |
| Enable all Pro tools | |
| Comma-separated tool groups | - |
| Comma-separated individual tools | - |
| Request rate limit | |
| Custom zone for scraping | |
| Custom zone for browser | |
| 变量 | 描述 | 默认值 |
|---|---|---|
| Bright Data API令牌(必填) | - |
| 启用所有专业版工具 | |
| 逗号分隔的工具组列表 | - |
| 逗号分隔的单个工具列表 | - |
| 请求速率限制 | |
| 爬取自定义区域 | |
| 浏览器自定义区域 | |
Best Practices
最佳实践
Tool Selection
工具选择
- Use structured tools when available (faster, more reliable)
web_data_* - Fall back to +
scrape_as_markdownfor unsupported sitesextract - Use browser automation only when JavaScript rendering is required
- 优先使用预构建的结构化工具(速度更快、更可靠)
web_data_* - 对于不支持的网站,使用+
scrape_as_markdown组合extract - 仅在需要JS渲染时使用浏览器自动化工具
Performance
性能优化
- Batch requests when possible (,
scrape_batch)search_engine_batch - Set appropriate timeouts (180s recommended for complex sites)
- Monitor usage with
session_stats
- 尽可能使用批量请求(,
scrape_batch)search_engine_batch - 设置合适的超时时间(复杂网站推荐180秒)
- 使用监控使用情况
session_stats
Security
安全建议
- Treat scraped content as untrusted data
- Filter and validate before passing to LLMs
- Use structured extraction over raw text when possible
- 将爬取的内容视为不可信数据
- 在传递给大语言模型前进行过滤和验证
- 优先使用结构化提取而非原始文本
Compliance
合规要求
- Respect robots.txt and terms of service
- Avoid scraping personal data without consent
- Use minimal, targeted requests
- 遵守robots.txt和服务条款
- 未经同意不得爬取个人数据
- 使用最小化、针对性的请求
Troubleshooting
故障排除
"spawn npx ENOENT" Error
"spawn npx ENOENT" 错误
Use full Node.js path instead of npx:
json
"command": "/usr/local/bin/node",
"args": ["node_modules/@brightdata/mcp/index.js"]使用完整Node.js路径替代npx:
json
"command": "/usr/local/bin/node",
"args": ["node_modules/@brightdata/mcp/index.js"]Timeout Issues
超时问题
- Increase timeout to 180s in client settings
- Use specialized tools (often faster)
web_data_* - Keep browser automation operations close together
- 在客户端设置中增加超时时间至180秒
- 使用专用的工具(通常速度更快)
web_data_* - 保持浏览器自动化操作的连贯性
References
参考资料
For detailed documentation, see:
- references/tools.md - Complete tool reference
- references/quickstart.md - Setup details
- references/integrations.md - Client configs
- references/toon-format.md - Token optimization
- references/examples.md - Usage examples
如需详细文档,请查看:
- references/tools.md - 完整工具参考
- references/quickstart.md - 配置细节
- references/integrations.md - 客户端配置
- references/toon-format.md - 令牌优化
- references/examples.md - 使用示例