agent-browser-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseagent-browser
agent-browser
Skill by ara.so — Daily 2026 Skills collection.
agent-browser由ara.so开发的Skill — 属于Daily 2026 Skills合集。
agent-browserInstallation
安装
Recommended (npm global)
推荐方式(npm全局安装)
bash
npm install -g agent-browser
agent-browser install # Download Chrome for Testing (first time only)bash
npm install -g agent-browser
agent-browser install # 首次使用时下载Chrome for TestingmacOS (Homebrew)
macOS(Homebrew)
bash
brew install agent-browser
agent-browser installbash
brew install agent-browser
agent-browser installRust / Cargo
Rust / Cargo
bash
cargo install agent-browser
agent-browser installbash
cargo install agent-browser
agent-browser installLocal project dependency
本地项目依赖
bash
npm install agent-browserbash
npm install agent-browserAdd to package.json scripts or invoke via npx
添加至package.json脚本或通过npx调用
undefinedundefinedLinux (with system dependencies)
Linux(含系统依赖)
bash
agent-browser install --with-depsbash
agent-browser install --with-depsQuick Start
快速开始
bash
agent-browser open https://example.com
agent-browser snapshot # Accessibility tree with @refs (best for AI)
agent-browser click @e2 # Click by ref from snapshot
agent-browser fill @e3 "hello@example.com" # Fill by ref
agent-browser get text @e1 # Get text content
agent-browser screenshot page.png
agent-browser closebash
agent-browser open https://example.com
agent-browser snapshot # 带@refs的无障碍树(最适合AI使用)
agent-browser click @e2 # 通过快照中的ref点击元素
agent-browser fill @e3 "hello@example.com" # 通过ref填充内容
agent-browser get text @e1 # 获取文本内容
agent-browser screenshot page.png
agent-browser closeCore Commands
核心命令
Navigation
页面导航
bash
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser close # Close browser (aliases: quit, exit)bash
agent-browser open <url> # 跳转页面(别名:goto, navigate)
agent-browser get url # 获取当前URL
agent-browser get title # 获取页面标题
agent-browser close # 关闭浏览器(别名:quit, exit)Accessibility Snapshot (recommended for AI agents)
无障碍快照(推荐AI Agent使用)
bash
agent-browser snapshot # Returns accessibility tree with @ref IDs
agent-browser snapshot -i # Interactive / compact modeSnapshot output includes refs you can use directly:
@eN@e1 [button] "Submit"
@e2 [textbox] "Email" value=""
@e3 [link] "Sign in"Then act on them:
bash
agent-browser fill @e2 "user@example.com"
agent-browser click @e1bash
agent-browser snapshot # 返回带@ref ID的无障碍树
agent-browser snapshot -i # 交互式/紧凑模式快照输出包含可直接使用的引用:
@eN@e1 [button] "Submit"
@e2 [textbox] "Email" value=""
@e3 [link] "Sign in"随后可基于这些引用执行操作:
bash
agent-browser fill @e2 "user@example.com"
agent-browser click @e1Interaction
页面交互
bash
agent-browser click <sel> # Click element
agent-browser dblclick <sel> # Double-click
agent-browser fill <sel> <text> # Clear and fill input
agent-browser type <sel> <text> # Type into element
agent-browser press <key> # Press key (Enter, Tab, Control+a)
agent-browser keyboard type <text> # Type at current focus (real keystrokes)
agent-browser keyboard inserttext <text> # Insert text without key events
agent-browser hover <sel> # Hover element
agent-browser select <sel> <value> # Select dropdown option
agent-browser check <sel> # Check checkbox
agent-browser uncheck <sel> # Uncheck checkbox
agent-browser scroll down 500 # Scroll (up/down/left/right, optional px)
agent-browser scroll down --selector "#feed" # Scroll within element
agent-browser scrollintoview <sel> # Scroll element into view
agent-browser drag <src> <target> # Drag and drop
agent-browser upload <sel> /path/file.pdf # Upload filebash
agent-browser click <sel> # 点击元素
agent-browser dblclick <sel> # 双击元素
agent-browser fill <sel> <text> # 清空并填充输入框
agent-browser type <sel> <text> # 在元素中输入内容
agent-browser press <key> # 按下按键(Enter, Tab, Control+a等)
agent-browser keyboard type <text> # 在当前焦点位置输入(模拟真实按键)
agent-browser keyboard inserttext <text> # 插入文本,不触发按键事件
agent-browser hover <sel> # 悬停在元素上
agent-browser select <sel> <value> # 选择下拉选项
agent-browser check <sel> # 勾选复选框
agent-browser uncheck <sel> # 取消勾选复选框
agent-browser scroll down 500 # 滚动页面(支持up/down/left/right,可指定像素值)
agent-browser scroll down --selector "#feed" # 在指定元素内滚动
agent-browser scrollintoview <sel> # 滚动至元素可见
agent-browser drag <src> <target> # 拖拽元素
agent-browser upload <sel> /path/file.pdf # 上传文件Screenshots & PDF
截图与PDF导出
bash
agent-browser screenshot # Save to temp dir, print path
agent-browser screenshot page.png # Save to path
agent-browser screenshot --full page.png # Full-page screenshot
agent-browser screenshot --annotate # Numbered element labels overlay
agent-browser screenshot --screenshot-dir ./shots # Custom output directory
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf # Save page as PDFbash
agent-browser screenshot # 保存至临时目录并打印路径
agent-browser screenshot page.png # 保存至指定路径
agent-browser screenshot --full page.png # 整页截图
agent-browser screenshot --annotate # 叠加带编号的元素标签
agent-browser screenshot --screenshot-dir ./shots # 自定义输出目录
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf # 将页面保存为PDFGetting Element Info
获取元素信息
bash
agent-browser get text <sel> # Text content
agent-browser get html <sel> # innerHTML
agent-browser get value <sel> # Input value
agent-browser get attr <sel> <attr> # Attribute value
agent-browser get count <sel> # Count matching elements
agent-browser get box <sel> # Bounding box
agent-browser get styles <sel> # Computed styles
agent-browser get cdp-url # CDP WebSocket URLbash
agent-browser get text <sel> # 获取文本内容
agent-browser get html <sel> # 获取innerHTML
agent-browser get value <sel> # 获取输入框值
agent-browser get attr <sel> <attr> # 获取属性值
agent-browser get count <sel> # 获取匹配元素数量
agent-browser get box <sel> # 获取元素边界框
agent-browser get styles <sel> # 获取计算样式
agent-browser get cdp-url # 获取CDP WebSocket地址State Checks
状态检查
bash
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>bash
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>Semantic Locators (find)
语义定位(find)
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@example.com"
agent-browser find placeholder "Search..." fill "rust"
agent-browser find testid "login-btn" click
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
agent-browser find role textbox fill "hello" --name "Username"Actions: , , , , , , ,
clickfilltypehoverfocuscheckunchecktextbash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@example.com"
agent-browser find placeholder "Search..." fill "rust"
agent-browser find testid "login-btn" click
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
agent-browser find role textbox fill "hello" --name "Username"支持的操作: , , , , , , ,
clickfilltypehoverfocuscheckunchecktextWaiting
等待操作
bash
agent-browser wait "#modal" # Wait for element visible
agent-browser wait 2000 # Wait N milliseconds
agent-browser wait --text "Welcome back" # Wait for text
agent-browser wait --url "**/dashboard" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "window.appReady === true" # Wait for JS condition
agent-browser wait "#spinner" --state hidden # Wait for element to disappearLoad states: , ,
loaddomcontentloadednetworkidlebash
agent-browser wait "#modal" # 等待元素可见
agent-browser wait 2000 # 等待N毫秒
agent-browser wait --text "Welcome back" # 等待文本出现
agent-browser wait --url "**/dashboard" # 等待URL匹配指定模式
agent-browser wait --load networkidle # 等待页面加载完成
agent-browser wait --fn "window.appReady === true" # 等待JS条件满足
agent-browser wait "#spinner" --state hidden # 等待元素消失加载状态: , ,
loaddomcontentloadednetworkidleJavaScript Eval
JavaScript执行
bash
agent-browser eval "document.title"
agent-browser eval "JSON.stringify(window.__STATE__)"
agent-browser eval -b "BASE64_ENCODED_JS"
echo "return document.body.innerHTML" | agent-browser eval --stdinbash
agent-browser eval "document.title"
agent-browser eval "JSON.stringify(window.__STATE__)"
agent-browser eval -b "BASE64_ENCODED_JS"
echo "return document.body.innerHTML" | agent-browser eval --stdinBatch Execution (efficient multi-step)
批量执行(高效多步骤操作)
bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["fill", "@e2", "user@example.com"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --jsonbash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["fill", "@e2", "user@example.com"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --jsonStop on first failure
遇到首个失败即停止
agent-browser batch --bail < commands.json
undefinedagent-browser batch --bail < commands.json
undefinedTabs & Frames
标签页与框架
bash
agent-browser tab # List tabs
agent-browser tab new https://... # New tab with URL
agent-browser tab 2 # Switch to tab 2
agent-browser tab close # Close current tab
agent-browser frame "#my-iframe" # Switch into iframe
agent-browser frame main # Return to main framebash
agent-browser tab # 列出所有标签页
agent-browser tab new https://... # 新建标签页并打开指定URL
agent-browser tab 2 # 切换至第2个标签页
agent-browser tab close # 关闭当前标签页
agent-browser frame "#my-iframe" # 切换至指定iframe
agent-browser frame main # 返回主框架Cookies & Storage
Cookie与存储
bash
agent-browser cookies
agent-browser cookies set session_id "abc123"
agent-browser cookies clear
agent-browser storage local
agent-browser storage local set theme dark
agent-browser storage local clear
agent-browser storage session set cart '{"items":[]}'bash
agent-browser cookies
agent-browser cookies set session_id "abc123"
agent-browser cookies clear
agent-browser storage local
agent-browser storage local set theme dark
agent-browser storage local clear
agent-browser storage session set cart '{"items":[]}'Network
网络操作
bash
agent-browser network route "**/api/users" --body '{"users":[]}' # Mock response
agent-browser network route "**/ads/**" --abort # Block requests
agent-browser network unroute # Remove all routes
agent-browser network requests --filter api # View requests
agent-browser network har start
agent-browser network har stop recording.harbash
agent-browser network route "**/api/users" --body '{"users":[]}' # 模拟响应
agent-browser network route "**/ads/**" --abort # �拦请求
agent-browser network unroute # 移除所有路由规则
agent-browser network requests --filter api # 查看请求
agent-browser network har start
agent-browser network har stop recording.harBrowser Settings
浏览器设置
bash
agent-browser set viewport 1280 800
agent-browser set viewport 375 812 2 # With device pixel ratio (retina)
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Custom":"value"}'
agent-browser set credentials admin secret
agent-browser set media darkbash
agent-browser set viewport 1280 800
agent-browser set viewport 375 812 2 # 设置设备像素比(适配视网膜屏)
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Custom":"value"}'
agent-browser set credentials admin secret
agent-browser set media darkAuth State
认证状态
bash
agent-browser state save ./auth.json # Save cookies + localStorage
agent-browser state load ./auth.json # Restore auth state
agent-browser state list # List saved states
agent-browser state show auth.json # Summary of saved statebash
agent-browser state save ./auth.json # 保存Cookie与localStorage
agent-browser state load ./auth.json # 恢复认证状态
agent-browser state list # 列出已保存的状态
agent-browser state show auth.json # 查看已保存状态的摘要Dialogs
对话框处理
bash
agent-browser dialog accept # Accept alert/confirm/prompt
agent-browser dialog accept "My input" # Accept prompt with text
agent-browser dialog dismissbash
agent-browser dialog accept # 确认警告/确认/提示对话框
agent-browser dialog accept "My input" # 确认提示对话框并输入内容
agent-browser dialog dismissClipboard
剪贴板操作
bash
agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy # Ctrl+C current selection
agent-browser clipboard paste # Ctrl+Vbash
agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy # 复制当前选中内容(Ctrl+C)
agent-browser clipboard paste # 粘贴内容(Ctrl+V)Diff & Visual Testing
差异对比与视觉测试
bash
agent-browser diff snapshot # vs last snapshot
agent-browser diff snapshot --baseline before.txt # vs saved file
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o diff.png
agent-browser diff url https://v1.example.com https://v2.example.com
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot
agent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"bash
agent-browser diff snapshot # 与上一次快照对比
agent-browser diff snapshot --baseline before.txt # 与已保存文件对比
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o diff.png
agent-browser diff url https://v1.example.com https://v2.example.com
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot
agent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"Debug & Profiling
调试与性能分析
bash
agent-browser trace start trace.zip
agent-browser trace stop
agent-browser profiler start
agent-browser profiler stop profile.json
agent-browser console # View console messages
agent-browser errors # View uncaught JS exceptions
agent-browser highlight "#button" # Visually highlight element
agent-browser inspect # Open Chrome DevTools
agent-browser connect 9222 # Connect to existing browser via CDP portbash
agent-browser trace start trace.zip
agent-browser trace stop
agent-browser profiler start
agent-browser profiler stop profile.json
agent-browser console # 查看控制台消息
agent-browser errors # 查看未捕获的JS异常
agent-browser highlight "#button" # 高亮显示元素
agent-browser inspect # 打开Chrome DevTools
agent-browser connect 9222 # 通过CDP端口连接至已运行的浏览器Common Patterns
常见使用场景
Login flow and save session
登录流程并保存会话
bash
#!/bin/bash
agent-browser open https://app.example.com/login
agent-browser fill "#email" "$LOGIN_EMAIL"
agent-browser fill "#password" "$LOGIN_PASSWORD"
agent-browser click "[type=submit]"
agent-browser wait --url "**/dashboard"
agent-browser state save ./session.jsonbash
#!/bin/bash
agent-browser open https://app.example.com/login
agent-browser fill "#email" "$LOGIN_EMAIL"
agent-browser fill "#password" "$LOGIN_PASSWORD"
agent-browser click "[type=submit]"
agent-browser wait --url "**/dashboard"
agent-browser state save ./session.jsonAI agent loop with snapshot-driven interaction
基于快照交互�AI Agent循环
bash
#!/bin/bash
agent-browser open https://app.example.com
agent-browser state load ./session.jsonbash
#!/bin/bash
agent-browser open https://app.example.com
agent-browser state load ./session.jsonGet snapshot, parse @refs, act
获取快照、解析@refs并执行操作
SNAPSHOT=$(agent-browser snapshot)
echo "$SNAPSHOT"
SNAPSHOT=$(agent-browser snapshot)
echo "$SNAPSHOT"
Agent determines @e5 is the search box
Agent判定@e5为搜索框
agent-browser fill @e5 "quarterly report"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot
agent-browser screenshot results.png
undefinedagent-browser fill @e5 "quarterly report"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot
agent-browser screenshot results.png
undefinedBatch commands from a script (JSON)
通过脚本执行批量命令(JSON格式)
bash
cat > commands.json << 'EOF'
[
["open", "https://news.ycombinator.com"],
["wait", "--load", "networkidle"],
["get", "title"],
["snapshot"],
["screenshot", "hn.png"]
]
EOF
agent-browser batch --json < commands.jsonbash
cat > commands.json << 'EOF'
[
["open", "https://news.ycombinator.com"],
["wait", "--load", "networkidle"],
["get", "title"],
["snapshot"],
["screenshot", "hn.png"]
]
EOF
agent-browser batch --json < commands.jsonScrape with mocked network
模拟网络请求进行数据爬取
bash
agent-browser open https://api-heavy-app.example.com
agent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'
agent-browser snapshot
agent-browser network unroutebash
agent-browser open https://api-heavy-app.example.com
agent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'
agent-browser snapshot
agent-browser network unrouteFull-page screenshot with annotations
带标注的整页截图
bash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full --annotate annotated.pngbash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full --annotate annotated.pngConnect to already-running Chrome
连接至已运行的Chrome浏览器
bash
undefinedbash
undefinedStart Chrome with remote debugging
启动Chrome并开启远程调试
google-chrome --remote-debugging-port=9222 &
agent-browser connect 9222
agent-browser open https://example.com
agent-browser snapshot
undefinedgoogle-chrome --remote-debugging-port=9222 &
agent-browser connect 9222
agent-browser open https://example.com
agent-browser snapshot
undefinedEmulate mobile device
模拟移动设备
bash
agent-browser set device "iPhone 14"
agent-browser open https://example.com
agent-browser screenshot mobile.pngbash
agent-browser set device "iPhone 14"
agent-browser open https://example.com
agent-browser screenshot mobile.pngHAR recording for network analysis
录制HAR文件用于网络分析
bash
agent-browser open https://example.com
agent-browser network har start
agent-browser click "#load-data"
agent-browser wait --load networkidle
agent-browser network har stop session.harbash
agent-browser open https://example.com
agent-browser network har start
agent-browser click "#load-data"
agent-browser wait --load networkidle
agent-browser network har stop session.harSelector Reference
选择器参考
| Format | Example | Notes |
|---|---|---|
| | From |
| CSS | | Standard CSS selectors |
| Text | | Exact text match |
| XPath | | Full XPath |
| 格式 | 示例 | 说明 |
|---|---|---|
| | 来自 |
| CSS | | 标准CSS选择器 |
| 文本 | | 精确文本匹配 |
| XPath | | 完整XPath |
Troubleshooting
故障排除
Chrome not found
未找到Chrome浏览器
bash
agent-browser install # Downloads Chrome for Testing
agent-browser install --with-deps # Linux: also installs system libsbash
agent-browser install # 下载Chrome for Testing
agent-browser install --with-deps # Linux系统:同时安装系统依赖库Element not found / timing issues
元素未找到 / 时序问题
bash
agent-browser wait "#my-element" # Wait for visibility first
agent-browser wait --load networkidle # Wait for page to settle
agent-browser wait --fn "!!document.querySelector('#app')"bash
agent-browser wait "#my-element" # 先等待元素可见
agent-browser wait --load networkidle # 等待页面加载稳定
agent-browser wait --fn "!!document.querySelector('#app')"Selector issues — use snapshot refs instead
选择器问题 —— 使用快照引用替代
bash
undefinedbash
undefinedInstead of fragile CSS:
替代不稳定的CSS选择器:
agent-browser click ".btn.btn-primary.submit-form"
agent-browser click ".btn.btn-primary.submit-form"
Use snapshot refs:
使用快照引用:
agent-browser snapshot # Find @e7 = [button] "Submit"
agent-browser click @e7
undefinedagent-browser snapshot # 找到@e7 = [button] "Submit"
agent-browser click @e7
undefinedDebug what's on the page
调试页面内容
bash
agent-browser screenshot debug.png # Visual check
agent-browser snapshot # Accessibility tree
agent-browser console # JS console output
agent-browser errors # Uncaught exceptions
agent-browser eval "document.readyState"bash
agent-browser screenshot debug.png # 可视化检查
agent-browser snapshot # 查看无障碍树
agent-browser console # 查看JS控制台输出
agent-browser errors # 查看未捕获的异常
agent-browser eval "document.readyState"Auth issues between sessions
会话间认证问题
bash
agent-browser state save ./auth.json # After successful login
agent-browser state load ./auth.json # At start of next sessionbash
agent-browser state save ./auth.json # 登录成功后保存状态
agent-browser state load ./auth.json # 下次会话开始时恢复状态Handling alerts/dialogs
处理警告对话框
bash
undefinedbash
undefinedSet up handler BEFORE the action that triggers dialog
在触发对话框的操作前设置处理程序
agent-browser dialog accept
agent-browser click "#delete-button"
undefinedagent-browser dialog accept
agent-browser click "#delete-button"
undefinedPerformance — use batch for multi-step workflows
性能优化 —— 多步骤工作流使用批量执行
bash
undefinedbash
undefinedSlow: one process per command
较慢:每个命令启动一个进程
agent-browser open https://example.com
agent-browser fill "#q" "search"
agent-browser click "#submit"
agent-browser open https://example.com
agent-browser fill "#q" "search"
agent-browser click "#submit"
Fast: single process, multiple commands
较快:单个进程执行多个命令
echo '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]'
| agent-browser batch --json
| agent-browser batch --json
undefinedecho '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]'
| agent-browser batch --json
| agent-browser batch --json
undefined