browser-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBrowser Automation
浏览器自动化
Available Tools
可用工具
- browser_act(instruction, starting_url?): Execute browser actions using natural language (click, type, scroll, select). Use to navigate to a page and act in a single call.
starting_url - browser_get_page_info(url?, text?, tables?, links?): Get page structure and DOM data (fast, no AI). Use to navigate first;
urlfor full text,text=Truefor table data,tables=Truefor all links.links=True - browser_manage_tabs(action, tab_index?, url?): Switch, close, or create browser tabs
- browser_save_screenshot(filename): Save current page screenshot to workspace
- browser_act(instruction, starting_url?):使用自然语言执行浏览器操作(点击、输入、滚动、选择)。使用可在单次调用中导航到页面并执行操作。
starting_url - browser_get_page_info(url?, text?, tables?, links?):获取页面结构和DOM数据(速度快,无需AI)。先使用导航;设置
url可获取完整文本,text=True可获取表格数据,tables=True可获取所有链接。links=True - browser_manage_tabs(action, tab_index?, url?):切换、关闭或创建浏览器标签页
- browser_save_screenshot(filename):将当前页面截图保存到工作区
When to Use
使用场景
Use browser automation when the task genuinely requires it:
- UI interactions: Filling forms, clicking buttons, navigating multi-step workflows
- Login-required pages: Accessing content behind authentication that APIs cannot reach
- Dynamic/JS-heavy pages: Content rendered client-side that plain HTTP requests can't capture
- Human-like browsing needed: Sites that block bots or require realistic interaction patterns
- Scraping structured data: When no API exists and the data must be extracted from rendered pages
Prefer web search or url_fetcher for general information lookup, news, or publicly accessible pages — browser automation is slower and heavier. Reserve it for tasks where simpler tools are insufficient.
仅在任务确实需要时使用浏览器自动化:
- UI交互:填写表单、点击按钮、导航多步骤工作流
- 需登录的页面:访问API无法触及的认证后内容
- 动态/重度依赖JS的页面:客户端渲染的内容,普通HTTP请求无法捕获
- 需要类人浏览的场景:会拦截机器人或要求真实交互模式的网站
- 结构化数据爬取:无API可用,必须从渲染页面提取数据的情况
对于一般信息查询、新闻或公开可访问的页面,优先使用web search或url_fetcher——浏览器自动化速度更慢、资源占用更高。仅在简单工具无法满足需求时才使用它。
Tool Selection
工具选择
- : UI interactions (click, type, scroll, form fill). Use
browser_actto open a page and act in one call.starting_url - : Fast page structure check and optional content extraction (<300ms). Use
browser_get_page_infoto navigate first.url - : Switch/close/create tabs (view tabs via
browser_manage_tabs)get_page_info - : Save milestone screenshots (search results, confirmations, key data)
browser_save_screenshot
- :UI交互操作(点击、输入、滚动、表单填写)。使用
browser_act可在一次调用中打开页面并执行操作。starting_url - :快速检查页面结构并可选提取内容(耗时<300ms)。先使用
browser_get_page_info导航。url - :切换/关闭/创建标签页(可通过
browser_manage_tabs查看标签页)get_page_info - :保存关键节点截图(搜索结果、确认信息、关键数据)
browser_save_screenshot
browser_act Best Practice
browser_act 最佳实践
- Combine up to 3 predictable steps: "1. Type 'laptop' in search 2. Click search button 3. Click first result"
- Use when opening a fresh page:
starting_urlbrowser_act(instruction='Search for laptops', starting_url='https://amazon.com') - On failure: check the screenshot to see current state, then retry from that point
- For visual creation (diagrams, drawings), prefer code/text input methods over mouse interactions
- 最多组合3个可预测步骤:“1. 在搜索框输入‘笔记本电脑’ 2. 点击搜索按钮 3. 点击第一个结果”
- 打开新页面时使用:
starting_urlbrowser_act(instruction='Search for laptops', starting_url='https://amazon.com') - 操作失败时:查看截图了解当前状态,然后从该状态重试
- 对于视觉内容创建(图表、绘图),优先使用代码/文本输入方式而非鼠标交互
browser_get_page_info Best Practice
—
- Use to navigate and inspect in one call:
urlbrowser_get_page_info(url='https://example.com', tables=True) - Use to get full page text content (useful for reading article text)
text=True - Use to extract structured table data from the page
tables=True - Use to get all links on the page (up to 200)
links=True
—