agent-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseagent-browser: CLI Browser Automation
agent-browser: CLI浏览器自动化
Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
Vercel推出的专为AI Agent设计的无头浏览器自动化CLI工具。基于无障碍快照的引用式选择(@e1、@e2)实现交互。
Setup
安装配置
bash
undefinedbash
undefinedCheck installation
检查是否已安装
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
Install if needed
若未安装则执行以下命令
npm install -g agent-browser
agent-browser install # Downloads Chromium
undefinednpm install -g agent-browser
agent-browser install # 下载Chromium
undefinedCore Workflow
核心工作流
The snapshot + ref pattern is optimal for LLMs:
- Navigate to URL
- Snapshot to get interactive elements with refs
- Interact using refs (@e1, @e2, etc.)
- Re-snapshot after navigation or DOM changes
bash
agent-browser open https://example.com
agent-browser snapshot -i # Get refs
agent-browser click @e1 # Use ref
agent-browser fill @e2 "text"
agent-browser snapshot -i # Re-snapshot基于快照+引用的模式是LLM的最优选择:
- 导航至目标URL
- 生成快照获取带引用标识的可交互元素
- 交互操作使用引用标识(@e1、@e2等)
- 重新生成快照在导航或DOM变更后执行
bash
agent-browser open https://example.com
agent-browser snapshot -i # 获取引用标识
agent-browser click @e1 # 使用引用标识操作
agent-browser fill @e2 "text"
agent-browser snapshot -i # 重新生成快照Key Commands
核心命令
Navigation
导航命令
bash
agent-browser open <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browserbash
agent-browser open <url> # 导航至指定URL
agent-browser back # 返回上一页
agent-browser forward # 前进到下一页
agent-browser reload # 刷新页面
agent-browser close # 关闭浏览器Snapshots (Essential for AI)
快照命令(AI交互必备)
bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -i --json # JSON output for parsing
agent-browser snapshot -c # Compact (remove empty)
agent-browser snapshot -d 3 # Limit depthbash
agent-browser snapshot # 生成完整无障碍树快照
agent-browser snapshot -i # 仅生成可交互元素快照(推荐使用)
agent-browser snapshot -i --json # 输出JSON格式用于解析
agent-browser snapshot -c # 精简模式(移除空内容)
agent-browser snapshot -d 3 # 限制快照深度Interactions
交互命令
bash
agent-browser click @e1 # Click element
agent-browser dblclick @e1 # Double-click
agent-browser fill @e1 "text" # Clear and fill input
agent-browser type @e1 "text" # Type without clearing
agent-browser press Enter # Press key
agent-browser hover @e1 # Hover element
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck
agent-browser select @e1 "option" # Select dropdown
agent-browser scroll down 500 # Scroll
agent-browser scrollintoview @e1 # Scroll element into viewbash
agent-browser click @e1 # 点击元素
agent-browser dblclick @e1 # 双击元素
agent-browser fill @e1 "text" # 清空并填写输入框
agent-browser type @e1 "text" # 输入文本(不清空原有内容)
agent-browser press Enter # 按下回车键
agent-browser hover @e1 # 悬停在元素上
agent-browser check @e1 # 勾选复选框
agent-browser uncheck @e1 # 取消勾选复选框
agent-browser select @e1 "option" # 选择下拉选项
agent-browser scroll down 500 # 向下滚动500像素
agent-browser scrollintoview @e1 # 滚动至元素可见Get Information
信息获取命令
bash
agent-browser get text @e1 # Get element text
agent-browser get html @e1 # Get element HTML
agent-browser get value @e1 # Get input value
agent-browser get attr href @e1 # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URLbash
agent-browser get text @e1 # 获取元素文本内容
agent-browser get html @e1 # 获取元素HTML代码
agent-browser get value @e1 # 获取输入框值
agent-browser get attr href @e1 # 获取元素属性
agent-browser get title # 获取页面标题
agent-browser get url # 获取当前页面URLScreenshots & PDFs
截图与PDF导出命令
bash
agent-browser screenshot # Viewport screenshot
agent-browser screenshot --full # Full page
agent-browser screenshot output.png # Save to file
agent-browser pdf output.pdf # Save as PDFbash
agent-browser screenshot # 截取视窗范围截图
agent-browser screenshot --full # 截取整页截图
agent-browser screenshot output.png # 将截图保存至文件
agent-browser pdf output.pdf # 导出页面为PDF文件Wait
等待命令
bash
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait "text" # Wait for textbash
agent-browser wait @e1 # 等待元素加载完成
agent-browser wait 2000 # 等待2000毫秒
agent-browser wait "text" # 等待指定文本出现Examples
示例
Login Flow
登录流程示例
bash
agent-browser open https://app.example.com/login
agent-browser snapshot -ibash
agent-browser open https://app.example.com/login
agent-browser snapshot -iOutput: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
输出:textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i # Verify logged in
undefinedagent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i # 验证是否登录成功
undefinedForm Filling
表单填写示例
bash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4 # Agree to terms
agent-browser click @e5 # Submit
agent-browser screenshot confirmation.pngbash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4 # 同意条款
agent-browser click @e5 # 提交表单
agent-browser screenshot confirmation.pngDebug Mode (Visible Browser)
调试模式(可视化浏览器)
bash
agent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1bash
agent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1Sessions (Parallel Browsers)
会话管理(多浏览器并行)
bash
agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com
agent-browser session listbash
agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com
agent-browser session listJSON Output
JSON格式输出
bash
agent-browser snapshot -i --jsonReturns:
json
{
"success": true,
"data": {
"refs": {
"e1": {"name": "Submit", "role": "button"},
"e2": {"name": "Email", "role": "textbox"}
}
}
}bash
agent-browser snapshot -i --json返回结果:
json
{
"success": true,
"data": {
"refs": {
"e1": {"name": "Submit", "role": "button"},
"e2": {"name": "Email", "role": "textbox"}
}
}
}When to Use vs Alternatives
适用场景与替代方案对比
Use agent-browser when:
- Prefer Bash-based workflows
- Need quick one-off automation
- Want simpler CLI commands
Use Playwright MCP when:
- Need deep MCP tool integration
- Building complex automation pipelines
- Want tool-based responses
推荐使用agent-browser的场景:
- 偏好基于Bash的工作流
- 需要快速实现一次性自动化任务
- 希望使用更简洁的CLI命令
推荐使用Playwright MCP的场景:
- 需要深度集成MCP工具
- 构建复杂的自动化流水线
- 期望基于工具的响应模式