agent-browser

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Browser Automation with agent-browser

基于agent-browser的浏览器自动化

Installation

安装

npm recommended

From Source

从源码安装

bash

git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
agent-browser install

bash

git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
agent-browser install

Quick start

快速开始

bash

agent-browser open <url>        # Navigate to page
agent-browser snapshot -i       # Get interactive elements with refs
agent-browser click @e1         # Click element by ref
agent-browser fill @e2 "text"   # Fill input by ref
agent-browser close             # Close browser

bash

agent-browser open <url>        # 跳转到指定页面
agent-browser snapshot -i       # 获取带引用标识的可交互元素
agent-browser click @e1         # 通过引用标识点击元素
agent-browser fill @e2 "text"   # 通过引用标识填充输入框
agent-browser close             # 关闭浏览器

Core workflow

核心工作流

Navigate:
```
agent-browser open <url>
```
Snapshot:
```
agent-browser snapshot -i
```
(returns elements with refs like
```
@e1
```
,
```
@e2
```
)
Interact using refs from the snapshot
Re-snapshot after navigation or significant DOM changes

导航：
```
agent-browser open <url>
```
快照：
```
agent-browser snapshot -i
```
（返回带
```
@e1
```
、
```
@e2
```
等引用标识的元素）
使用快照中的引用标识进行交互操作
导航或DOM发生重大变化后重新生成快照

Commands

命令列表

Navigation

导航命令

bash

agent-browser open <url>      # Navigate to URL
agent-browser back            # Go back
agent-browser forward         # Go forward
agent-browser reload          # Reload page
agent-browser close           # Close browser

bash

agent-browser open <url>      # 跳转到指定URL
agent-browser back            # 后退
agent-browser forward         # 前进
agent-browser reload          # 刷新页面
agent-browser close           # 关闭浏览器

Snapshot (page analysis)

快照（页面分析）

bash

agent-browser snapshot            # Full accessibility tree
agent-browser snapshot -i         # Interactive elements only (recommended)
agent-browser snapshot -c         # Compact output
agent-browser snapshot -d 3       # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selector

bash

agent-browser snapshot            # 获取完整可访问性树
agent-browser snapshot -i         # 仅获取可交互元素（推荐使用）
agent-browser snapshot -c         # 精简输出格式
agent-browser snapshot -d 3       # 限制输出深度为3
agent-browser snapshot -s "#main" # 仅返回匹配CSS选择器的元素

Interactions (use @refs from snapshot)

交互操作（使用快照中的@引用标识）

bash

agent-browser click @e1           # Click
agent-browser dblclick @e1        # Double-click
agent-browser focus @e1           # Focus element
agent-browser fill @e2 "text"     # Clear and type
agent-browser type @e2 "text"     # Type without clearing
agent-browser press Enter         # Press key
agent-browser press Control+a     # Key combination
agent-browser keydown Shift       # Hold key down
agent-browser keyup Shift         # Release key
agent-browser hover @e1           # Hover
agent-browser check @e1           # Check checkbox
agent-browser uncheck @e1         # Uncheck checkbox
agent-browser select @e1 "value"  # Select dropdown
agent-browser scroll down 500     # Scroll page
agent-browser scrollintoview @e1  # Scroll element into view
agent-browser drag @e1 @e2        # Drag and drop
agent-browser upload @e1 file.pdf # Upload files

bash

agent-browser click @e1           # 点击元素
agent-browser dblclick @e1        # 双击元素
agent-browser focus @e1           # 聚焦元素
agent-browser fill @e2 "text"     # 清空并填充输入框
agent-browser type @e2 "text"     # 直接输入内容（不清空原有文本）
agent-browser press Enter         # 按下指定按键
agent-browser press Control+a     # 按下组合键
agent-browser keydown Shift       # 按住指定按键
agent-browser keyup Shift         # 释放指定按键
agent-browser hover @e1           # 悬停在元素上
agent-browser check @e1           # 勾选复选框
agent-browser uncheck @e1         # 取消勾选复选框
agent-browser select @e1 "value"  # 选择下拉框选项
agent-browser scroll down 500     # 向下滚动500像素
agent-browser scrollintoview @e1  # 滚动到元素可见位置
agent-browser drag @e1 @e2        # 拖拽元素到目标位置
agent-browser upload @e1 file.pdf # 上传文件

Get information

获取信息

bash

agent-browser get text @e1        # Get element text
agent-browser get html @e1        # Get innerHTML
agent-browser get value @e1       # Get input value
agent-browser get attr @e1 href   # Get attribute
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
agent-browser get count ".item"   # Count matching elements
agent-browser get box @e1         # Get bounding box

bash

agent-browser get text @e1        # 获取元素文本内容
agent-browser get html @e1        # 获取元素innerHTML
agent-browser get value @e1       # 获取输入框值
agent-browser get attr @e1 href   # 获取元素指定属性
agent-browser get title           # 获取页面标题
agent-browser get url             # 获取当前页面URL
agent-browser get count ".item"   # 统计匹配选择器的元素数量
agent-browser get box @e1         # 获取元素边界框信息

Check state

状态检查

bash

agent-browser is visible @e1      # Check if visible
agent-browser is enabled @e1      # Check if enabled
agent-browser is checked @e1      # Check if checked

bash

agent-browser is visible @e1      # 检查元素是否可见
agent-browser is enabled @e1      # 检查元素是否可用
agent-browser is checked @e1      # 检查复选框是否已勾选

Screenshots & PDF

截图与PDF导出

bash

agent-browser screenshot          # Screenshot to stdout
agent-browser screenshot path.png # Save to file
agent-browser screenshot --full   # Full page
agent-browser pdf output.pdf      # Save as PDF

bash

agent-browser screenshot          # 将截图输出到标准输出
agent-browser screenshot path.png # 将截图保存到指定文件
agent-browser screenshot --full   # 截取完整页面
agent-browser pdf output.pdf      # 将页面保存为PDF

Video recording

视频录制

bash

agent-browser record start ./demo.webm    # Start recording (uses current URL + state)
agent-browser click @e1                   # Perform actions
agent-browser record stop                 # Stop and save video
agent-browser record restart ./take2.webm # Stop current + start new recording

Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.

bash

agent-browser record start ./demo.webm    # 开始录制视频（基于当前URL和页面状态）
agent-browser click @e1                   # 执行交互操作
agent-browser record stop                 # 停止录制并保存视频
agent-browser record restart ./take2.webm # 停止当前录制并开始新的录制

录制会创建新的上下文，但会保留会话中的Cookie和存储数据。如果未指定URL，会自动返回当前页面。为了获得流畅的演示效果，建议先探索页面，再开始录制。

Wait

等待命令

bash

agent-browser wait @e1                     # Wait for element
agent-browser wait 2000                    # Wait milliseconds
agent-browser wait --text "Success"        # Wait for text
agent-browser wait --url "/dashboard"    # Wait for URL pattern
agent-browser wait --load networkidle      # Wait for network idle
agent-browser wait --fn "window.ready"     # Wait for JS condition

bash

agent-browser wait @e1                     # 等待元素出现
agent-browser wait 2000                    # 等待指定毫秒数
agent-browser wait --text "Success"        # 等待指定文本出现
agent-browser wait --url "/dashboard"    # 等待URL匹配指定模式
agent-browser wait --load networkidle      # 等待网络空闲
agent-browser wait --fn "window.ready"     # 等待指定JS条件满足

Mouse control

鼠标控制

bash

agent-browser mouse move 100 200      # Move mouse
agent-browser mouse down left         # Press button
agent-browser mouse up left           # Release button
agent-browser mouse wheel 100         # Scroll wheel

bash

agent-browser mouse move 100 200      # 移动鼠标到指定坐标
agent-browser mouse down left         # 按下鼠标左键
agent-browser mouse up left           # 释放鼠标左键
agent-browser mouse wheel 100         # 滚动鼠标滚轮

Semantic locators (alternative to refs)

语义定位（替代引用标识的方式）

bash

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text

bash

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text

Browser settings

浏览器设置

bash

agent-browser set viewport 1920 1080      # Set viewport size
agent-browser set device "iPhone 14"      # Emulate device
agent-browser set geo 37.7749 -122.4194   # Set geolocation
agent-browser set offline on              # Toggle offline mode
agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
agent-browser set credentials user pass   # HTTP basic auth
agent-browser set media dark              # Emulate color scheme

bash

agent-browser set viewport 1920 1080      # 设置视口尺寸
agent-browser set device "iPhone 14"      # 模拟指定设备
agent-browser set geo 37.7749 -122.4194   # 设置地理位置
agent-browser set offline on              # 切换离线模式
agent-browser set headers '{"X-Key":"v"}' # 设置额外HTTP请求头
agent-browser set credentials user pass   # 设置HTTP基础认证信息
agent-browser set media dark              # 模拟深色配色方案

Cookies & Storage

Cookie与存储

bash

agent-browser cookies                     # Get all cookies
agent-browser cookies set name value      # Set cookie
agent-browser cookies clear               # Clear cookies
agent-browser storage local               # Get all localStorage
agent-browser storage local key           # Get specific key
agent-browser storage local set k v       # Set value
agent-browser storage local clear         # Clear all

bash

agent-browser cookies                     # 获取所有Cookie
agent-browser cookies set name value      # 设置Cookie
agent-browser cookies clear               # 清空所有Cookie
agent-browser storage local               # 获取所有localStorage数据
agent-browser storage local key           # 获取localStorage中指定键的值
agent-browser storage local set k v       # 设置localStorage键值对
agent-browser storage local clear         # 清空所有localStorage数据

Network

网络控制

bash

agent-browser network route <url>              # Intercept requests
agent-browser network route <url> --abort      # Block requests
agent-browser network route <url> --body '{}'  # Mock response
agent-browser network unroute [url]            # Remove routes
agent-browser network requests                 # View tracked requests
agent-browser network requests --filter api    # Filter requests

bash

agent-browser network route <url>              # 拦截指定URL的请求
agent-browser network route <url> --abort      # 阻止指定URL的请求
agent-browser network route <url> --body '{}'  # 模拟指定URL的响应内容
agent-browser network unroute [url]            # 移除指定URL的路由规则
agent-browser network requests                 # 查看已跟踪的请求
agent-browser network requests --filter api    # 筛选指定类型的请求

Tabs & Windows

标签页与窗口

bash

agent-browser tab                 # List tabs
agent-browser tab new [url]       # New tab
agent-browser tab 2               # Switch to tab
agent-browser tab close           # Close tab
agent-browser window new          # New window

bash

agent-browser tab                 # 列出所有标签页
agent-browser tab new [url]       # 新建标签页
agent-browser tab 2               # 切换到第2个标签页
agent-browser tab close           # 关闭当前标签页
agent-browser window new          # 新建浏览器窗口

Frames

框架（Frame）操作

bash

agent-browser frame "#iframe"     # Switch to iframe
agent-browser frame main          # Back to main frame

bash

agent-browser frame "#iframe"     # 切换到指定iframe
agent-browser frame main          # 返回主框架

Dialogs

对话框处理

bash

agent-browser dialog accept [text]  # Accept dialog
agent-browser dialog dismiss        # Dismiss dialog

bash

agent-browser dialog accept [text]  # 确认对话框（可输入文本）
agent-browser dialog dismiss        # 取消对话框

JavaScript

JavaScript执行

bash

agent-browser eval "document.title"   # Run JavaScript

bash

agent-browser eval "document.title"   # 执行指定JavaScript代码

State management

状态管理

bash

agent-browser state save auth.json    # Save session state
agent-browser state load auth.json    # Load saved state

bash

agent-browser state save auth.json    # 保存会话状态到文件
agent-browser state load auth.json    # 从文件加载会话状态

Example: Form submission

示例：表单提交

bash

agent-browser open https://example.com/form
agent-browser snapshot -i

bash

agent-browser open https://example.com/form
agent-browser snapshot -i

Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]

输出内容包含：文本框“Email” [ref=e1]、文本框“Password” [ref=e2]、按钮“Submit” [ref=e3]

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result

undefined

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # 检查提交结果

undefined

Example: Authentication with saved state

示例：基于保存状态的认证

bash

undefined

bash

undefined

Login once

首次登录

agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "username" agent-browser fill @e2 "password" agent-browser click @e3 agent-browser wait --url "/dashboard" agent-browser state save auth.json

Later sessions: load saved state

后续会话：加载保存的状态

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

undefined

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

undefined

Sessions (parallel browsers)

会话管理（多浏览器并行）

bash

agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list

bash

agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list

JSON output (for parsing)

JSON输出（用于解析）

Add

--json

for machine-readable output:

bash

agent-browser snapshot -i --json
agent-browser get text @e1 --json

添加

--json

参数可获取机器可读的JSON格式输出：

bash

agent-browser snapshot -i --json
agent-browser get text @e1 --json

Debugging

调试功能

bash

agent-browser open example.com --headed              # Show browser window
agent-browser console                                # View console messages
agent-browser console --clear                        # Clear console
agent-browser errors                                 # View page errors
agent-browser errors --clear                         # Clear errors
agent-browser highlight @e1                          # Highlight element
agent-browser trace start                            # Start recording trace
agent-browser trace stop trace.zip                   # Stop and save trace
agent-browser record start ./debug.webm              # Record from current page
agent-browser record stop                            # Save recording
agent-browser --cdp 9222 snapshot                    # Connect via CDP

bash

agent-browser open example.com --headed              # 显示浏览器窗口（非无头模式）
agent-browser console                                # 查看控制台消息
agent-browser console --clear                        # 清空控制台
agent-browser errors                                 # 查看页面错误
agent-browser errors --clear                         # 清空错误记录
agent-browser highlight @e1                          # 高亮显示指定元素
agent-browser trace start                            # 开始录制性能追踪
agent-browser trace stop trace.zip                   # 停止录制并保存追踪文件
agent-browser record start ./debug.webm              # 从当前页面开始录制视频
agent-browser record stop                            # 保存录制的视频
agent-browser --cdp 9222 snapshot                    # 通过Chrome DevTools Protocol连接

Troubleshooting

故障排除

If the command is not found on Linux ARM64, use the full path in the bin folder.
If an element is not found, use snapshot to find the correct ref.
If the page is not loaded, add a wait command after navigation.
Use --headed to see the browser window for debugging.

如果在Linux ARM64系统中找不到命令，请使用bin文件夹中的完整路径。
如果无法找到元素，请使用snapshot命令获取正确的引用标识。
如果页面未加载完成，请在导航后添加wait命令。
使用--headed参数显示浏览器窗口进行调试。

Options

可选参数

--session <name> uses an isolated session.
--json provides JSON output.
--full takes a full page screenshot.
--headed shows the browser window.
--timeout sets the command timeout in milliseconds.
--cdp <port> connects via Chrome DevTools Protocol.

--session <name> 使用独立的隔离会话
--json 输出JSON格式结果
--full 截取完整页面截图
--headed 显示浏览器窗口
--timeout 设置命令超时时间（毫秒）
--cdp <port> 通过Chrome DevTools Protocol连接

Notes

注意事项

Refs are stable per page load but change on navigation.
Always snapshot after navigation to get new refs.
Use fill instead of type for input fields to ensure existing text is cleared.

引用标识在页面加载后保持稳定，但导航后会发生变化。
导航后务必重新生成快照以获取新的引用标识。
对于输入框，建议使用fill命令而非type命令，确保原有文本被清空。

Reporting Issues

问题反馈

Skill issues: Open an issue at https://github.com/TheSethRose/Agent-Browser-CLI
agent-browser CLI issues: Open an issue at https://github.com/vercel-labs/agent-browser

Skill相关问题：在https://github.com/TheSethRose/Agent-Browser-CLI提交Issue
agent-browser CLI相关问题：在https://github.com/vercel-labs/agent-browser提交Issue

agent-browser

Original

Translation

Browser Automation with agent-browser

基于agent-browser的浏览器自动化

Installation

安装

npm recommended

推荐使用npm安装

From Source

从源码安装

Quick start

快速开始

Core workflow

核心工作流

Commands

命令列表

Navigation

导航命令

Snapshot (page analysis)

快照（页面分析）

Interactions (use @refs from snapshot)

交互操作（使用快照中的@引用标识）

Get information

获取信息

Check state

状态检查

Screenshots & PDF

截图与PDF导出

Video recording

视频录制

Wait

等待命令

Mouse control

鼠标控制

Semantic locators (alternative to refs)

语义定位（替代引用标识的方式）

Browser settings

浏览器设置

Cookies & Storage

Cookie与存储

Network

网络控制

Tabs & Windows

标签页与窗口

Frames

框架（Frame）操作

Dialogs

对话框处理

JavaScript

JavaScript执行

State management

状态管理

Example: Form submission

示例：表单提交

Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]

输出内容包含：文本框“Email” [ref=e1]、文本框“Password” [ref=e2]、按钮“Submit” [ref=e3]

Example: Authentication with saved state

示例：基于保存状态的认证

Login once

首次登录

Later sessions: load saved state

后续会话：加载保存的状态

Sessions (parallel browsers)

会话管理（多浏览器并行）

JSON output (for parsing)

JSON输出（用于解析）

Debugging

调试功能

Troubleshooting

故障排除

Options

可选参数

Notes

注意事项

Reporting Issues

问题反馈