browser-automation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Browser Automation

浏览器自动化

Available Tools

可用工具

browser_act(instruction, starting_url?): Execute browser actions using natural language (click, type, scroll, select). Use
```
starting_url
```
to navigate to a page and act in a single call.
browser_get_page_info(url?, text?, tables?, links?): Get page structure and DOM data (fast, no AI). Use
```
url
```
to navigate first;
```
text=True
```
for full text,
```
tables=True
```
for table data,
```
links=True
```
for all links.
browser_manage_tabs(action, tab_index?, url?): Switch, close, or create browser tabs
browser_save_screenshot(filename): Save current page screenshot to workspace

browser_act(instruction, starting_url?)：使用自然语言执行浏览器操作（点击、输入、滚动、选择）。使用
```
starting_url
```
可在单次调用中导航到页面并执行操作。
browser_get_page_info(url?, text?, tables?, links?)：获取页面结构和DOM数据（速度快，无需AI）。先使用
```
url
```
导航；设置
```
text=True
```
可获取完整文本，
```
tables=True
```
可获取表格数据，
```
links=True
```
可获取所有链接。
browser_manage_tabs(action, tab_index?, url?)：切换、关闭或创建浏览器标签页
browser_save_screenshot(filename)：将当前页面截图保存到工作区

When to Use

使用场景

Use browser automation when the task genuinely requires it:

UI interactions: Filling forms, clicking buttons, navigating multi-step workflows
Login-required pages: Accessing content behind authentication that APIs cannot reach
Dynamic/JS-heavy pages: Content rendered client-side that plain HTTP requests can't capture
Human-like browsing needed: Sites that block bots or require realistic interaction patterns
Scraping structured data: When no API exists and the data must be extracted from rendered pages

Prefer web search or url_fetcher for general information lookup, news, or publicly accessible pages — browser automation is slower and heavier. Reserve it for tasks where simpler tools are insufficient.

仅在任务确实需要时使用浏览器自动化：

UI交互：填写表单、点击按钮、导航多步骤工作流
需登录的页面：访问API无法触及的认证后内容
动态/重度依赖JS的页面：客户端渲染的内容，普通HTTP请求无法捕获
需要类人浏览的场景：会拦截机器人或要求真实交互模式的网站
结构化数据爬取：无API可用，必须从渲染页面提取数据的情况

对于一般信息查询、新闻或公开可访问的页面，优先使用web search或url_fetcher——浏览器自动化速度更慢、资源占用更高。仅在简单工具无法满足需求时才使用它。

Tool Selection

工具选择

```
browser_act
```
: UI interactions (click, type, scroll, form fill). Use
```
starting_url
```
to open a page and act in one call.
```
browser_get_page_info
```
: Fast page structure check and optional content extraction (<300ms). Use
```
url
```
to navigate first.
```
browser_manage_tabs
```
: Switch/close/create tabs (view tabs via
```
get_page_info
```
)
```
browser_save_screenshot
```
: Save milestone screenshots (search results, confirmations, key data)

```
browser_act
```
：UI交互操作（点击、输入、滚动、表单填写）。使用
```
starting_url
```
可在一次调用中打开页面并执行操作。
```
browser_get_page_info
```
：快速检查页面结构并可选提取内容（耗时<300ms）。先使用
```
url
```
导航。
```
browser_manage_tabs
```
：切换/关闭/创建标签页（可通过
```
get_page_info
```
查看标签页）
```
browser_save_screenshot
```
：保存关键节点截图（搜索结果、确认信息、关键数据）

browser_act Best Practice

browser_act 最佳实践

Combine up to 3 predictable steps: "1. Type 'laptop' in search 2. Click search button 3. Click first result"

Use

starting_url

when opening a fresh page:

browser_act(instruction='Search for laptops', starting_url='https://amazon.com')

On failure: check the screenshot to see current state, then retry from that point
For visual creation (diagrams, drawings), prefer code/text input methods over mouse interactions

最多组合3个可预测步骤：“1. 在搜索框输入‘笔记本电脑’ 2. 点击搜索按钮 3. 点击第一个结果”

打开新页面时使用

starting_url

：

browser_act(instruction='Search for laptops', starting_url='https://amazon.com')

操作失败时：查看截图了解当前状态，然后从该状态重试
对于视觉内容创建（图表、绘图），优先使用代码/文本输入方式而非鼠标交互

browser_get_page_info Best Practice

—

Use

url

to navigate and inspect in one call:

browser_get_page_info(url='https://example.com', tables=True)

Use
```
text=True
```
to get full page text content (useful for reading article text)
Use
```
tables=True
```
to extract structured table data from the page
Use
```
links=True
```
to get all links on the page (up to 200)

—