agent-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseagent-browser — AI Browser Automation CLI
agent-browser — AI 浏览器自动化 CLI
Rust-native CLI by Vercel Labs. Persistent daemon per session, CDP-based, deterministic ref selectors (, ), JSON output. Built for AI agents.
@e1@e2Repo: github.com/vercel-labs/agent-browser | License: Apache 2.0
Vercel Labs 推出的原生Rust CLI。每个会话对应持久化守护进程,基于CDP实现,支持确定性引用选择器(、)、JSON输出,专为AI Agent打造。
@e1@e2仓库地址: github.com/vercel-labs/agent-browser | 许可证: Apache 2.0
Install
安装
bash
undefinedbash
undefinednpm (recommended)
npm(推荐方式)
npm install -g agent-browser
agent-browser install # downloads Chrome for Testing
npm install -g agent-browser
agent-browser install # 下载Chrome for Testing
homebrew
homebrew
brew install agent-browser && agent-browser install
brew install agent-browser && agent-browser install
cargo
cargo
cargo install agent-browser && agent-browser install
cargo install agent-browser && agent-browser install
Linux — also install system deps
Linux 还需要安装系统依赖
agent-browser install --with-deps
Upgrade: `agent-browser upgrade`agent-browser install --with-deps
升级命令:`agent-browser upgrade`Core Concept: Refs
核心概念:引用(Refs)
Every assigns deterministic refs (, , ...) to DOM elements. Use refs instead of CSS selectors — they're stable within a snapshot, semantic, and LLM-friendly.
snapshot@e1@e2bash
agent-browser open https://example.com
agent-browser snapshot -i # interactive elements only每次操作都会为DOM元素分配确定性的引用(、...)。你可以使用引用替代CSS选择器——它们在同一次快照内是稳定的、语义化的,且对LLM友好。
snapshot@e1@e2bash
agent-browser open https://example.com
agent-browser snapshot -i # 仅提取可交互元素Output:
输出:
- heading "Example" [ref=e1]
- 标题 "Example" [ref=e1]
- link "More info" [ref=e2]
- 链接 "More info" [ref=e2]
- button "Submit" [ref=e3]
- 按钮 "Submit" [ref=e3]
agent-browser click @e3 # click Submit
**After any DOM change (navigation, AJAX, form submit) — re-snapshot to get fresh refs.**agent-browser click @e3 # 点击提交按钮
**DOM发生任何变更后(导航、AJAX、表单提交)——请重新生成快照以获取最新的引用。**Essential Commands
核心命令
Navigation
导航
bash
agent-browser open <url>
agent-browser back | forward | reloadbash
agent-browser open <url>
agent-browser back | forward | reloadSnapshot (primary way to "see" the page)
快照(页面「查看」的核心方式)
bash
agent-browser snapshot # full accessibility tree
agent-browser snapshot -i # interactive elements only (buttons, inputs, links)
agent-browser snapshot -c # compact (remove empty nodes)
agent-browser snapshot -d 3 # limit depth
agent-browser snapshot -s "#main" # scope to selector
agent-browser snapshot --json # machine-readablebash
agent-browser snapshot # 完整可访问性树
agent-browser snapshot -i # 仅提取可交互元素(按钮、输入框、链接)
agent-browser snapshot -c # 紧凑模式(移除空节点)
agent-browser snapshot -d 3 # 限制遍历深度
agent-browser snapshot -s "#main" # 限定选择器范围
agent-browser snapshot --json # 机器可读格式输出Interaction
交互
bash
agent-browser click <ref|selector>
agent-browser fill <ref> "text" # clear + type (for inputs)
agent-browser type <ref> "text" # type without clearing
agent-browser select <ref> "option" # dropdown
agent-browser check|uncheck <ref> # checkbox
agent-browser hover <ref>
agent-browser press Enter|Tab|Escape # keyboard
agent-browser press Control+a # key combo
agent-browser scroll down 500 # scroll page
agent-browser upload <ref> file.pdf # file uploadbash
agent-browser click <ref|selector>
agent-browser fill <ref> "text" # 清空后输入(适用于输入框)
agent-browser type <ref> "text" # 不清空直接输入
agent-browser select <ref> "option" # 下拉框选择
agent-browser check|uncheck <ref> # 勾选/取消勾选复选框
agent-browser hover <ref>
agent-browser press Enter|Tab|Escape # 键盘按键
agent-browser press Control+a # 组合按键
agent-browser scroll down 500 # 页面滚动
agent-browser upload <ref> file.pdf # 文件上传Data Extraction
数据提取
bash
agent-browser get text <ref> # element text
agent-browser get html <ref> # innerHTML
agent-browser get value <ref> # input value
agent-browser get attr <ref> href # attribute
agent-browser get title # page title
agent-browser get url # current URLbash
agent-browser get text <ref> # 元素文本内容
agent-browser get html <ref> # 元素innerHTML
agent-browser get value <ref> # 输入框值
agent-browser get attr <ref> href # 元素属性
agent-browser get title # 页面标题
agent-browser get url # 当前URLScreenshot & PDF
截图与PDF
bash
agent-browser screenshot [path] # to file or temp
agent-browser screenshot --full # full page scroll capture
agent-browser screenshot --annotate # numbered labels matching refs
agent-browser pdf output.pdfbash
agent-browser screenshot [path] # 保存到指定路径或临时文件
agent-browser screenshot --full # 全页面滚动截图
agent-browser screenshot --annotate # 添加与引用匹配的编号标注
agent-browser pdf output.pdfWait
等待
bash
agent-browser wait <ref> # wait for element visible
agent-browser wait 2000 # wait ms
agent-browser wait --load networkidle # wait for network idle
agent-browser wait --url "**/dashboard" # wait for URL pattern
agent-browser wait --text "Success" # wait for text
agent-browser wait <ref> --state hidden # wait for element to disappearbash
agent-browser wait <ref> # 等待元素可见
agent-browser wait 2000 # 等待指定毫秒数
agent-browser wait --load networkidle # 等待网络空闲
agent-browser wait --url "**/dashboard" # 等待URL匹配指定规则
agent-browser wait --text "Success" # 等待指定文本出现
agent-browser wait <ref> --state hidden # 等待元素消失Semantic Locators (alternative to refs)
语义定位器(引用的替代方案)
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find testid "login-btn" clickbash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find testid "login-btn" clickSession Management
会话管理
bash
agent-browser session list # active sessions
agent-browser --session myapp <cmd> # named session
agent-browser close # close current
agent-browser close --all # close allbash
agent-browser session list # 查看活跃会话
agent-browser --session myapp <cmd> # 使用指定命名会话执行命令
agent-browser close # 关闭当前会话
agent-browser close --all # 关闭所有会话Tabs
标签页
bash
agent-browser tab # list tabs
agent-browser tab new [url] # new tab
agent-browser tab 2 # switch to tab 2
agent-browser tab closebash
agent-browser tab # 列出所有标签页
agent-browser tab new [url] # 新建标签页
agent-browser tab 2 # 切换到第2个标签页
agent-browser tab closeAI Agent Workflow Pattern
AI Agent工作流模式
Standard loop for any AI agent:
bash
undefined所有AI Agent的标准循环流程:
bash
undefined1. Navigate
1. 导航
agent-browser open https://target.com
agent-browser open https://target.com
2. Observe (snapshot → LLM reads → decides action)
2. 观察(生成快照 → LLM读取内容 → 决定执行动作)
agent-browser snapshot -i --json
agent-browser snapshot -i --json
3. Act (LLM picks ref from snapshot)
3. 执行动作(LLM从快照中选择对应引用)
agent-browser fill @e2 "search query"
agent-browser click @e3
agent-browser fill @e2 "search query"
agent-browser click @e3
4. Wait for result
4. 等待结果加载
agent-browser wait --load networkidle
agent-browser wait --load networkidle
5. Re-observe (new snapshot after DOM changed)
5. 重新观察(DOM变更后生成新快照)
agent-browser snapshot -i --json
agent-browser snapshot -i --json
6. Extract or continue
6. 提取数据或继续执行
agent-browser get text @e5
Repeat 2-5 until task complete. Always `close` when done.agent-browser get text @e5
重复步骤2-5直到任务完成,任务结束后务必执行`close`关闭会话。Batch Execution
批量执行
When exact sequence is known upfront:
bash
cat << 'EOF' | agent-browser batch --json
[
["open", "https://example.com/login"],
["fill", "@e1", "user@example.com"],
["fill", "@e2", "password123"],
["click", "@e3"],
["wait", "--load", "networkidle"],
["screenshot", "result.png"]
]
EOF--bail当你提前明确完整执行序列时可使用批量执行:
bash
cat << 'EOF' | agent-browser batch --json
[
["open", "https://example.com/login"],
["fill", "@e1", "user@example.com"],
["fill", "@e2", "password123"],
["click", "@e3"],
["wait", "--load", "networkidle"],
["screenshot", "result.png"]
]
EOF添加参数可在遇到第一个错误时停止执行。
--bailAuthentication & State Persistence
认证与状态持久化
bash
undefinedbash
undefinedSave credentials (encrypted vault — LLM never sees password)
保存凭证(加密存储 — LLM永远不会获取到密码)
echo "pass" | agent-browser auth save myapp
--url https://app.example.com/login
--username user@example.com --password-stdin
--url https://app.example.com/login
--username user@example.com --password-stdin
echo "pass" | agent-browser auth save myapp
--url https://app.example.com/login
--username user@example.com --password-stdin
--url https://app.example.com/login
--username user@example.com --password-stdin
Login with saved creds
使用保存的凭证登录
agent-browser auth login myapp
agent-browser auth login myapp
Auto-persist session (cookies, localStorage, IndexedDB)
自动持久化会话(Cookie、localStorage、IndexedDB)
agent-browser --session-name myapp open https://app.example.com
agent-browser --session-name myapp open https://app.example.com
... interact ...
... 执行交互操作 ...
agent-browser close # state auto-saved to ~/.agent-browser/sessions/
agent-browser close # 状态自动保存到 ~/.agent-browser/sessions/
Next run — auto-restored, already logged in
下次运行时自动恢复会话,保持登录状态
agent-browser --session-name myapp open https://app.example.com/dashboard
Manual state save/load:
```bash
agent-browser state save auth.json
agent-browser state load auth.jsonagent-browser --session-name myapp open https://app.example.com/dashboard
手动状态保存/加载:
```bash
agent-browser state save auth.json
agent-browser state load auth.jsonNetwork Control
网络控制
bash
agent-browser network requests # list tracked requests
agent-browser network requests --type xhr,fetch # filter
agent-browser network route "**/analytics" --abort # block tracking
agent-browser network route "**/api/*" --body '{"mock":true}' # mock response
agent-browser network har start && agent-browser network har stop output.harbash
agent-browser network requests # 列出已追踪的请求
agent-browser network requests --type xhr,fetch # 按请求类型过滤
agent-browser network route "**/analytics" --abort # 拦截屏蔽追踪请求
agent-browser network route "**/api/*" --body '{"mock":true}' # mock响应
agent-browser network har start && agent-browser network har stop output.harDevice Emulation
设备模拟
bash
agent-browser set device "iPhone 14"
agent-browser set viewport 1920 1080 # desktop
agent-browser set viewport 1920 1080 2 # retina
agent-browser set media dark # dark mode
agent-browser set geo 52.2297 21.0122 # Warsawbash
agent-browser set device "iPhone 14"
agent-browser set viewport 1920 1080 # 桌面端视口
agent-browser set viewport 1920 1080 2 # 视网膜屏视口
agent-browser set media dark # 深色模式
agent-browser set geo 52.2297 21.0122 # 定位到华沙Configuration
配置
Priority: CLI flags > env vars > >
./agent-browser.json~/.agent-browser/config.jsonKey env vars:
bash
AGENT_BROWSER_SESSION=myapp # default session
AGENT_BROWSER_HEADED=1 # show browser window
AGENT_BROWSER_EXECUTABLE_PATH=/path # custom Chrome
AGENT_BROWSER_PROXY=http://host:port # proxy
AGENT_BROWSER_DEFAULT_TIMEOUT=25000 # operation timeout (max 30000)
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 # daemon auto-shutdown
AGENT_BROWSER_ENCRYPTION_KEY=<64hex> # encrypt state files
AGENT_BROWSER_ALLOWED_DOMAINS=a.com,b.com # restrict navigation
AGENT_BROWSER_MAX_OUTPUT=50000 # truncate output (prevent context flooding)Config file ():
agent-browser.jsonjson
{
"headed": false,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"userAgent": "my-agent/1.0",
"screenshotDir": "./shots",
"colorScheme": "dark"
}Key CLI flags: , , , , , , , , , , ,
--json--session <name>--profile <name|path>--headed--proxy <url>--ignore-https-errors--annotate--engine lightpanda--provider <cloud>--content-boundaries--no-auto-dialog--debug优先级:CLI参数 > 环境变量 > >
./agent-browser.json~/.agent-browser/config.json核心环境变量:
bash
AGENT_BROWSER_SESSION=myapp # 默认会话
AGENT_BROWSER_HEADED=1 # 显示浏览器窗口
AGENT_BROWSER_EXECUTABLE_PATH=/path # 自定义Chrome路径
AGENT_BROWSER_PROXY=http://host:port # 代理配置
AGENT_BROWSER_DEFAULT_TIMEOUT=25000 # 操作超时时间(最大30000)
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 # 守护进程自动关闭超时时间
AGENT_BROWSER_ENCRYPTION_KEY=<64hex> # 状态文件加密密钥
AGENT_BROWSER_ALLOWED_DOMAINS=a.com,b.com # 限制可访问域名
AGENT_BROWSER_MAX_OUTPUT=50000 # 截断输出(防止上下文溢出)配置文件():
agent-browser.jsonjson
{
"headed": false,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"userAgent": "my-agent/1.0",
"screenshotDir": "./shots",
"colorScheme": "dark"
}核心CLI参数:、、、、、、、、、、、
--json--session <name>--profile <name|path>--headed--proxy <url>--ignore-https-errors--annotate--engine lightpanda--provider <cloud>--content-boundaries--no-auto-dialog--debugSafety Features for AI
面向AI的安全特性
bash
--content-boundaries # wrap untrusted page content in markers
--max-output 50000 # prevent context window flooding
--allowed-domains a.com,b.com # restrict navigation to trusted domains
--action-policy policy.json # gate destructive actions
--confirm-actions eval,download # require approval for sensitive opsbash
--content-boundaries # 为不可信页面内容添加边界标记
--max-output 50000 # 防止上下文窗口溢出
--allowed-domains a.com,b.com # 限制仅可访问可信域名
--action-policy policy.json # 管控高风险操作
--confirm-actions eval,download # 敏感操作需要人工确认Cloud Providers
云服务商支持
For scalable/CI deployments:
--provider <name>Supported: AgentCore, Browserbase, Browserless, BrowserUse, Kernel
用于可扩展/CI部署场景:
--provider <name>支持的服务商:AgentCore、Browserbase、Browserless、BrowserUse、Kernel
Gotchas
注意事项
- Refs invalidate on DOM change — always re-snapshot after navigation/AJAX/form submit
- Daemon persists — run explicitly to avoid orphaned processes
agent-browser close - networkidle unreliable on SPAs — prefer or
wait --text "X"wait --url "pattern" - Timeout max 30s — keep under 30000ms
DEFAULT_TIMEOUT - State files contain tokens — use or add to
ENCRYPTION_KEY.gitignore - Shadow DOM not in snapshots — only light DOM visible
- Large pages — use to limit output size
snapshot -i -c -d 3 - Chrome profile lock — close Chrome before using with same profile
--profile
- DOM变更后引用会失效 — 导航/AJAX/表单提交后务必重新生成快照
- 守护进程会持续运行 — 显式执行避免残留进程
agent-browser close - SPA场景下networkidle不可靠 — 优先使用或
wait --text "X"wait --url "pattern" - 最大超时时间为30秒 — 保持低于30000ms
DEFAULT_TIMEOUT - 状态文件包含令牌 — 使用加密或将状态文件加入
ENCRYPTION_KEY.gitignore - 快照不包含Shadow DOM — 仅可识别light DOM内容
- 大页面场景 — 使用限制输出大小
snapshot -i -c -d 3 - Chrome profile锁定问题 — 使用前请关闭对应profile的Chrome进程
--profile