agent-browser-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

agent-browser

agent-browser

Skill by ara.so — Daily 2026 Skills collection.
agent-browser
is a headless browser automation CLI built in Rust, designed for AI agents. It wraps Chrome via the Chrome DevTools Protocol (CDP) and exposes a fast, ergonomic command-line interface for navigation, interaction, accessibility snapshots, screenshots, network interception, and more — with no Node.js or Playwright runtime required.
ara.so开发的Skill — 属于Daily 2026 Skills合集。
agent-browser
是一款基于Rust构建的无头浏览器自动化CLI,专为AI Agent设计。它通过Chrome DevTools Protocol(CDP)封装Chrome浏览器,提供了便捷高效的命令行接口,支持页面导航、交互、无障碍快照、截图、网络拦截等功能——无需依赖Node.js或Playwright运行时。

Installation

安装

Recommended (npm global)

推荐方式(npm全局安装)

bash
npm install -g agent-browser
agent-browser install  # Download Chrome for Testing (first time only)
bash
npm install -g agent-browser
agent-browser install  # 首次使用时下载Chrome for Testing

macOS (Homebrew)

macOS(Homebrew)

bash
brew install agent-browser
agent-browser install
bash
brew install agent-browser
agent-browser install

Rust / Cargo

Rust / Cargo

bash
cargo install agent-browser
agent-browser install
bash
cargo install agent-browser
agent-browser install

Local project dependency

本地项目依赖

bash
npm install agent-browser
bash
npm install agent-browser

Add to package.json scripts or invoke via npx

添加至package.json脚本或通过npx调用

undefined
undefined

Linux (with system dependencies)

Linux(含系统依赖)

bash
agent-browser install --with-deps
bash
agent-browser install --with-deps

Quick Start

快速开始

bash
agent-browser open https://example.com
agent-browser snapshot                        # Accessibility tree with @refs (best for AI)
agent-browser click @e2                       # Click by ref from snapshot
agent-browser fill @e3 "hello@example.com"   # Fill by ref
agent-browser get text @e1                    # Get text content
agent-browser screenshot page.png
agent-browser close
bash
agent-browser open https://example.com
agent-browser snapshot                        # 带@refs的无障碍树(最适合AI使用)
agent-browser click @e2                       # 通过快照中的ref点击元素
agent-browser fill @e3 "hello@example.com"   # 通过ref填充内容
agent-browser get text @e1                    # 获取文本内容
agent-browser screenshot page.png
agent-browser close

Core Commands

核心命令

Navigation

页面导航

bash
agent-browser open <url>           # Navigate (aliases: goto, navigate)
agent-browser get url              # Get current URL
agent-browser get title            # Get page title
agent-browser close                # Close browser (aliases: quit, exit)
bash
agent-browser open <url>           # 跳转页面(别名:goto, navigate)
agent-browser get url              # 获取当前URL
agent-browser get title            # 获取页面标题
agent-browser close                # 关闭浏览器(别名:quit, exit)

Accessibility Snapshot (recommended for AI agents)

无障碍快照(推荐AI Agent使用)

bash
agent-browser snapshot             # Returns accessibility tree with @ref IDs
agent-browser snapshot -i          # Interactive / compact mode
Snapshot output includes
@eN
refs you can use directly:
@e1 [button] "Submit"
@e2 [textbox] "Email" value=""
@e3 [link] "Sign in"
Then act on them:
bash
agent-browser fill @e2 "user@example.com"
agent-browser click @e1
bash
agent-browser snapshot             # 返回带@ref ID的无障碍树
agent-browser snapshot -i          # 交互式/紧凑模式
快照输出包含可直接使用的
@eN
引用:
@e1 [button] "Submit"
@e2 [textbox] "Email" value=""
@e3 [link] "Sign in"
随后可基于这些引用执行操作:
bash
agent-browser fill @e2 "user@example.com"
agent-browser click @e1

Interaction

页面交互

bash
agent-browser click <sel>                     # Click element
agent-browser dblclick <sel>                  # Double-click
agent-browser fill <sel> <text>               # Clear and fill input
agent-browser type <sel> <text>               # Type into element
agent-browser press <key>                     # Press key (Enter, Tab, Control+a)
agent-browser keyboard type <text>            # Type at current focus (real keystrokes)
agent-browser keyboard inserttext <text>      # Insert text without key events
agent-browser hover <sel>                     # Hover element
agent-browser select <sel> <value>            # Select dropdown option
agent-browser check <sel>                     # Check checkbox
agent-browser uncheck <sel>                   # Uncheck checkbox
agent-browser scroll down 500                 # Scroll (up/down/left/right, optional px)
agent-browser scroll down --selector "#feed"  # Scroll within element
agent-browser scrollintoview <sel>            # Scroll element into view
agent-browser drag <src> <target>             # Drag and drop
agent-browser upload <sel> /path/file.pdf     # Upload file
bash
agent-browser click <sel>                     # 点击元素
agent-browser dblclick <sel>                  # 双击元素
agent-browser fill <sel> <text>               # 清空并填充输入框
agent-browser type <sel> <text>               # 在元素中输入内容
agent-browser press <key>                     # 按下按键(Enter, Tab, Control+a等)
agent-browser keyboard type <text>            # 在当前焦点位置输入(模拟真实按键)
agent-browser keyboard inserttext <text>      # 插入文本,不触发按键事件
agent-browser hover <sel>                     # 悬停在元素上
agent-browser select <sel> <value>            # 选择下拉选项
agent-browser check <sel>                     # 勾选复选框
agent-browser uncheck <sel>                   # 取消勾选复选框
agent-browser scroll down 500                 # 滚动页面(支持up/down/left/right,可指定像素值)
agent-browser scroll down --selector "#feed"  # 在指定元素内滚动
agent-browser scrollintoview <sel>            # 滚动至元素可见
agent-browser drag <src> <target>             # 拖拽元素
agent-browser upload <sel> /path/file.pdf     # 上传文件

Screenshots & PDF

截图与PDF导出

bash
agent-browser screenshot                          # Save to temp dir, print path
agent-browser screenshot page.png                 # Save to path
agent-browser screenshot --full page.png          # Full-page screenshot
agent-browser screenshot --annotate               # Numbered element labels overlay
agent-browser screenshot --screenshot-dir ./shots # Custom output directory
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf                      # Save page as PDF
bash
agent-browser screenshot                          # 保存至临时目录并打印路径
agent-browser screenshot page.png                 # 保存至指定路径
agent-browser screenshot --full page.png          # 整页截图
agent-browser screenshot --annotate               # 叠加带编号的元素标签
agent-browser screenshot --screenshot-dir ./shots # 自定义输出目录
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf                      # 将页面保存为PDF

Getting Element Info

获取元素信息

bash
agent-browser get text <sel>           # Text content
agent-browser get html <sel>           # innerHTML
agent-browser get value <sel>          # Input value
agent-browser get attr <sel> <attr>    # Attribute value
agent-browser get count <sel>          # Count matching elements
agent-browser get box <sel>            # Bounding box
agent-browser get styles <sel>         # Computed styles
agent-browser get cdp-url              # CDP WebSocket URL
bash
agent-browser get text <sel>           # 获取文本内容
agent-browser get html <sel>           # 获取innerHTML
agent-browser get value <sel>          # 获取输入框值
agent-browser get attr <sel> <attr>    # 获取属性值
agent-browser get count <sel>          # 获取匹配元素数量
agent-browser get box <sel>            # 获取元素边界框
agent-browser get styles <sel>         # 获取计算样式
agent-browser get cdp-url              # 获取CDP WebSocket地址

State Checks

状态检查

bash
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>
bash
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>

Semantic Locators (find)

语义定位(find)

bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@example.com"
agent-browser find placeholder "Search..." fill "rust"
agent-browser find testid "login-btn" click
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
agent-browser find role textbox fill "hello" --name "Username"
Actions:
click
,
fill
,
type
,
hover
,
focus
,
check
,
uncheck
,
text
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@example.com"
agent-browser find placeholder "Search..." fill "rust"
agent-browser find testid "login-btn" click
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
agent-browser find role textbox fill "hello" --name "Username"
支持的操作:
click
,
fill
,
type
,
hover
,
focus
,
check
,
uncheck
,
text

Waiting

等待操作

bash
agent-browser wait "#modal"                          # Wait for element visible
agent-browser wait 2000                              # Wait N milliseconds
agent-browser wait --text "Welcome back"             # Wait for text
agent-browser wait --url "**/dashboard"              # Wait for URL pattern
agent-browser wait --load networkidle                # Wait for load state
agent-browser wait --fn "window.appReady === true"   # Wait for JS condition
agent-browser wait "#spinner" --state hidden         # Wait for element to disappear
Load states:
load
,
domcontentloaded
,
networkidle
bash
agent-browser wait "#modal"                          # 等待元素可见
agent-browser wait 2000                              # 等待N毫秒
agent-browser wait --text "Welcome back"             # 等待文本出现
agent-browser wait --url "**/dashboard"              # 等待URL匹配指定模式
agent-browser wait --load networkidle                # 等待页面加载完成
agent-browser wait --fn "window.appReady === true"   # 等待JS条件满足
agent-browser wait "#spinner" --state hidden         # 等待元素消失
加载状态:
load
,
domcontentloaded
,
networkidle

JavaScript Eval

JavaScript执行

bash
agent-browser eval "document.title"
agent-browser eval "JSON.stringify(window.__STATE__)"
agent-browser eval -b "BASE64_ENCODED_JS"
echo "return document.body.innerHTML" | agent-browser eval --stdin
bash
agent-browser eval "document.title"
agent-browser eval "JSON.stringify(window.__STATE__)"
agent-browser eval -b "BASE64_ENCODED_JS"
echo "return document.body.innerHTML" | agent-browser eval --stdin

Batch Execution (efficient multi-step)

批量执行(高效多步骤操作)

bash
echo '[
  ["open", "https://example.com"],
  ["snapshot", "-i"],
  ["fill", "@e2", "user@example.com"],
  ["click", "@e1"],
  ["screenshot", "result.png"]
]' | agent-browser batch --json
bash
echo '[
  ["open", "https://example.com"],
  ["snapshot", "-i"],
  ["fill", "@e2", "user@example.com"],
  ["click", "@e1"],
  ["screenshot", "result.png"]
]' | agent-browser batch --json

Stop on first failure

遇到首个失败即停止

agent-browser batch --bail < commands.json
undefined
agent-browser batch --bail < commands.json
undefined

Tabs & Frames

标签页与框架

bash
agent-browser tab                    # List tabs
agent-browser tab new https://...    # New tab with URL
agent-browser tab 2                  # Switch to tab 2
agent-browser tab close              # Close current tab
agent-browser frame "#my-iframe"     # Switch into iframe
agent-browser frame main             # Return to main frame
bash
agent-browser tab                    # 列出所有标签页
agent-browser tab new https://...    # 新建标签页并打开指定URL
agent-browser tab 2                  # 切换至第2个标签页
agent-browser tab close              # 关闭当前标签页
agent-browser frame "#my-iframe"     # 切换至指定iframe
agent-browser frame main             # 返回主框架

Cookies & Storage

Cookie与存储

bash
agent-browser cookies
agent-browser cookies set session_id "abc123"
agent-browser cookies clear

agent-browser storage local
agent-browser storage local set theme dark
agent-browser storage local clear
agent-browser storage session set cart '{"items":[]}'
bash
agent-browser cookies
agent-browser cookies set session_id "abc123"
agent-browser cookies clear

agent-browser storage local
agent-browser storage local set theme dark
agent-browser storage local clear
agent-browser storage session set cart '{"items":[]}'

Network

网络操作

bash
agent-browser network route "**/api/users" --body '{"users":[]}'  # Mock response
agent-browser network route "**/ads/**" --abort                    # Block requests
agent-browser network unroute                                       # Remove all routes
agent-browser network requests --filter api                        # View requests
agent-browser network har start
agent-browser network har stop recording.har
bash
agent-browser network route "**/api/users" --body '{"users":[]}'  # 模拟响应
agent-browser network route "**/ads/**" --abort                    # �拦请求
agent-browser network unroute                                       # 移除所有路由规则
agent-browser network requests --filter api                        # 查看请求
agent-browser network har start
agent-browser network har stop recording.har

Browser Settings

浏览器设置

bash
agent-browser set viewport 1280 800
agent-browser set viewport 375 812 2        # With device pixel ratio (retina)
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Custom":"value"}'
agent-browser set credentials admin secret
agent-browser set media dark
bash
agent-browser set viewport 1280 800
agent-browser set viewport 375 812 2        # 设置设备像素比(适配视网膜屏)
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Custom":"value"}'
agent-browser set credentials admin secret
agent-browser set media dark

Auth State

认证状态

bash
agent-browser state save ./auth.json    # Save cookies + localStorage
agent-browser state load ./auth.json    # Restore auth state
agent-browser state list                # List saved states
agent-browser state show auth.json      # Summary of saved state
bash
agent-browser state save ./auth.json    # 保存Cookie与localStorage
agent-browser state load ./auth.json    # 恢复认证状态
agent-browser state list                # 列出已保存的状态
agent-browser state show auth.json      # 查看已保存状态的摘要

Dialogs

对话框处理

bash
agent-browser dialog accept             # Accept alert/confirm/prompt
agent-browser dialog accept "My input"  # Accept prompt with text
agent-browser dialog dismiss
bash
agent-browser dialog accept             # 确认警告/确认/提示对话框
agent-browser dialog accept "My input"  # 确认提示对话框并输入内容
agent-browser dialog dismiss

Clipboard

剪贴板操作

bash
agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy           # Ctrl+C current selection
agent-browser clipboard paste          # Ctrl+V
bash
agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy           # 复制当前选中内容(Ctrl+C)
agent-browser clipboard paste          # 粘贴内容(Ctrl+V)

Diff & Visual Testing

差异对比与视觉测试

bash
agent-browser diff snapshot                                  # vs last snapshot
agent-browser diff snapshot --baseline before.txt            # vs saved file
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o diff.png
agent-browser diff url https://v1.example.com https://v2.example.com
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot
agent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"
bash
agent-browser diff snapshot                                  # 与上一次快照对比
agent-browser diff snapshot --baseline before.txt            # 与已保存文件对比
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o diff.png
agent-browser diff url https://v1.example.com https://v2.example.com
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot
agent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"

Debug & Profiling

调试与性能分析

bash
agent-browser trace start trace.zip
agent-browser trace stop
agent-browser profiler start
agent-browser profiler stop profile.json
agent-browser console                  # View console messages
agent-browser errors                   # View uncaught JS exceptions
agent-browser highlight "#button"      # Visually highlight element
agent-browser inspect                  # Open Chrome DevTools
agent-browser connect 9222             # Connect to existing browser via CDP port
bash
agent-browser trace start trace.zip
agent-browser trace stop
agent-browser profiler start
agent-browser profiler stop profile.json
agent-browser console                  # 查看控制台消息
agent-browser errors                   # 查看未捕获的JS异常
agent-browser highlight "#button"      # 高亮显示元素
agent-browser inspect                  # 打开Chrome DevTools
agent-browser connect 9222             # 通过CDP端口连接至已运行的浏览器

Common Patterns

常见使用场景

Login flow and save session

登录流程并保存会话

bash
#!/bin/bash
agent-browser open https://app.example.com/login
agent-browser fill "#email" "$LOGIN_EMAIL"
agent-browser fill "#password" "$LOGIN_PASSWORD"
agent-browser click "[type=submit]"
agent-browser wait --url "**/dashboard"
agent-browser state save ./session.json
bash
#!/bin/bash
agent-browser open https://app.example.com/login
agent-browser fill "#email" "$LOGIN_EMAIL"
agent-browser fill "#password" "$LOGIN_PASSWORD"
agent-browser click "[type=submit]"
agent-browser wait --url "**/dashboard"
agent-browser state save ./session.json

AI agent loop with snapshot-driven interaction

基于快照交互�AI Agent循环

bash
#!/bin/bash
agent-browser open https://app.example.com
agent-browser state load ./session.json
bash
#!/bin/bash
agent-browser open https://app.example.com
agent-browser state load ./session.json

Get snapshot, parse @refs, act

获取快照、解析@refs并执行操作

SNAPSHOT=$(agent-browser snapshot) echo "$SNAPSHOT"
SNAPSHOT=$(agent-browser snapshot) echo "$SNAPSHOT"

Agent determines @e5 is the search box

Agent判定@e5为搜索框

agent-browser fill @e5 "quarterly report" agent-browser press Enter agent-browser wait --load networkidle agent-browser snapshot agent-browser screenshot results.png
undefined
agent-browser fill @e5 "quarterly report" agent-browser press Enter agent-browser wait --load networkidle agent-browser snapshot agent-browser screenshot results.png
undefined

Batch commands from a script (JSON)

通过脚本执行批量命令(JSON格式)

bash
cat > commands.json << 'EOF'
[
  ["open", "https://news.ycombinator.com"],
  ["wait", "--load", "networkidle"],
  ["get", "title"],
  ["snapshot"],
  ["screenshot", "hn.png"]
]
EOF

agent-browser batch --json < commands.json
bash
cat > commands.json << 'EOF'
[
  ["open", "https://news.ycombinator.com"],
  ["wait", "--load", "networkidle"],
  ["get", "title"],
  ["snapshot"],
  ["screenshot", "hn.png"]
]
EOF

agent-browser batch --json < commands.json

Scrape with mocked network

模拟网络请求进行数据爬取

bash
agent-browser open https://api-heavy-app.example.com
agent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'
agent-browser snapshot
agent-browser network unroute
bash
agent-browser open https://api-heavy-app.example.com
agent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'
agent-browser snapshot
agent-browser network unroute

Full-page screenshot with annotations

带标注的整页截图

bash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full --annotate annotated.png
bash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full --annotate annotated.png

Connect to already-running Chrome

连接至已运行的Chrome浏览器

bash
undefined
bash
undefined

Start Chrome with remote debugging

启动Chrome并开启远程调试

google-chrome --remote-debugging-port=9222 &
agent-browser connect 9222 agent-browser open https://example.com agent-browser snapshot
undefined
google-chrome --remote-debugging-port=9222 &
agent-browser connect 9222 agent-browser open https://example.com agent-browser snapshot
undefined

Emulate mobile device

模拟移动设备

bash
agent-browser set device "iPhone 14"
agent-browser open https://example.com
agent-browser screenshot mobile.png
bash
agent-browser set device "iPhone 14"
agent-browser open https://example.com
agent-browser screenshot mobile.png

HAR recording for network analysis

录制HAR文件用于网络分析

bash
agent-browser open https://example.com
agent-browser network har start
agent-browser click "#load-data"
agent-browser wait --load networkidle
agent-browser network har stop session.har
bash
agent-browser open https://example.com
agent-browser network har start
agent-browser click "#load-data"
agent-browser wait --load networkidle
agent-browser network har stop session.har

Selector Reference

选择器参考

FormatExampleNotes
@ref
@e1
,
@e12
From
snapshot
output — preferred for AI
CSS
#id
,
.class
,
[attr=val]
Standard CSS selectors
Text
"Sign In"
Exact text match
XPath
//button[@type='submit']
Full XPath
格式示例说明
@ref
@e1
,
@e12
来自
snapshot
输出 —— 推荐AI使用
CSS
#id
,
.class
,
[attr=val]
标准CSS选择器
文本
"Sign In"
精确文本匹配
XPath
//button[@type='submit']
完整XPath

Troubleshooting

故障排除

Chrome not found

未找到Chrome浏览器

bash
agent-browser install              # Downloads Chrome for Testing
agent-browser install --with-deps  # Linux: also installs system libs
bash
agent-browser install              # 下载Chrome for Testing
agent-browser install --with-deps  # Linux系统:同时安装系统依赖库

Element not found / timing issues

元素未找到 / 时序问题

bash
agent-browser wait "#my-element"              # Wait for visibility first
agent-browser wait --load networkidle         # Wait for page to settle
agent-browser wait --fn "!!document.querySelector('#app')"
bash
agent-browser wait "#my-element"              # 先等待元素可见
agent-browser wait --load networkidle         # 等待页面加载稳定
agent-browser wait --fn "!!document.querySelector('#app')"

Selector issues — use snapshot refs instead

选择器问题 —— 使用快照引用替代

bash
undefined
bash
undefined

Instead of fragile CSS:

替代不稳定的CSS选择器:

agent-browser click ".btn.btn-primary.submit-form"
agent-browser click ".btn.btn-primary.submit-form"

Use snapshot refs:

使用快照引用:

agent-browser snapshot # Find @e7 = [button] "Submit" agent-browser click @e7
undefined
agent-browser snapshot # 找到@e7 = [button] "Submit" agent-browser click @e7
undefined

Debug what's on the page

调试页面内容

bash
agent-browser screenshot debug.png        # Visual check
agent-browser snapshot                    # Accessibility tree
agent-browser console                     # JS console output
agent-browser errors                      # Uncaught exceptions
agent-browser eval "document.readyState"
bash
agent-browser screenshot debug.png        # 可视化检查
agent-browser snapshot                    # 查看无障碍树
agent-browser console                     # 查看JS控制台输出
agent-browser errors                      # 查看未捕获的异常
agent-browser eval "document.readyState"

Auth issues between sessions

会话间认证问题

bash
agent-browser state save ./auth.json   # After successful login
agent-browser state load ./auth.json   # At start of next session
bash
agent-browser state save ./auth.json   # 登录成功后保存状态
agent-browser state load ./auth.json   # 下次会话开始时恢复状态

Handling alerts/dialogs

处理警告对话框

bash
undefined
bash
undefined

Set up handler BEFORE the action that triggers dialog

在触发对话框的操作前设置处理程序

agent-browser dialog accept agent-browser click "#delete-button"
undefined
agent-browser dialog accept agent-browser click "#delete-button"
undefined

Performance — use batch for multi-step workflows

性能优化 —— 多步骤工作流使用批量执行

bash
undefined
bash
undefined

Slow: one process per command

较慢:每个命令启动一个进程

agent-browser open https://example.com agent-browser fill "#q" "search" agent-browser click "#submit"
agent-browser open https://example.com agent-browser fill "#q" "search" agent-browser click "#submit"

Fast: single process, multiple commands

较快:单个进程执行多个命令

echo '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]'
| agent-browser batch --json
undefined
echo '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]'
| agent-browser batch --json
undefined