vercel-agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vercel Agent Browser Skill

Vercel Agent Browser Skill

Skill by ara.so — AI Agent Skills collection.
ara.so开发的Skill——AI Agent技能合集。

Overview

概述

agent-browser
is a fast native Rust CLI for browser automation designed specifically for AI agents. It provides an accessibility-first approach with semantic selectors and ref-based element targeting, making it ideal for LLM-driven web automation.
Key Features:
  • Native Rust performance with npm/Homebrew distribution
  • Accessibility tree snapshots with stable element refs (
    @e1
    ,
    @e2
    , etc.)
  • Semantic locators (role, text, label, placeholder)
  • AI chat mode for natural language control
  • Batch command execution
  • Network interception and HAR recording
  • Chrome DevTools Protocol (CDP) streaming
agent-browser
是一款专为AI Agent设计的快速原生Rust CLI浏览器自动化工具。它采用无障碍优先的方法,提供语义选择器和基于引用的元素定位,非常适合LLM驱动的Web自动化。
核心特性:
  • 原生Rust性能,支持npm/Homebrew分发
  • 带有稳定元素引用(
    @e1
    @e2
    等)的无障碍树快照
  • 语义定位符(角色、文本、标签、占位符)
  • 自然语言控制的AI聊天模式
  • 批量命令执行
  • 网络拦截与HAR录制
  • Chrome DevTools Protocol (CDP) 流处理

Installation

安装

Global (Recommended)

全局安装(推荐)

bash
npm install -g agent-browser
agent-browser install  # Downloads Chrome for Testing
bash
npm install -g agent-browser
agent-browser install  # 下载Chrome for Testing

Project Local

项目本地安装

bash
npm install agent-browser
npx agent-browser install
bash
npm install agent-browser
npx agent-browser install

Alternative Methods

其他安装方式

bash
undefined
bash
undefined

Homebrew (macOS)

Homebrew(macOS)

brew install agent-browser agent-browser install
brew install agent-browser agent-browser install

Cargo (Rust)

Cargo(Rust)

cargo install agent-browser agent-browser install
cargo install agent-browser agent-browser install

Linux with system dependencies

带系统依赖的Linux安装

agent-browser install --with-deps
undefined
agent-browser install --with-deps
undefined

Upgrading

升级

bash
agent-browser upgrade  # Auto-detects installation method
bash
agent-browser upgrade  # 自动检测安装方式

Core Concepts

核心概念

Element References (@refs)

元素引用(@refs)

The most powerful feature for AI agents is the accessibility snapshot with stable refs:
bash
undefined
对AI Agent而言最强大的功能是带有稳定引用的无障碍快照:
bash
undefined

Get accessibility tree with element refs

获取带元素引用的无障碍树

agent-browser open https://example.com agent-browser snapshot
agent-browser open https://example.com agent-browser snapshot

Output includes refs like:

输出包含如下引用:

@e1 heading "Example Domain"

@e1 heading "Example Domain"

@e2 link "More information..."

@e2 link "More information..."

@e3 textbox "Search" (placeholder)

@e3 textbox "Search" (placeholder)

Use refs directly in commands

直接在命令中使用引用

agent-browser click @e2 agent-browser fill @e3 "search query" agent-browser get text @e1
undefined
agent-browser click @e2 agent-browser fill @e3 "search query" agent-browser get text @e1
undefined

Traditional Selectors

传统选择器

Standard CSS selectors also work:
bash
agent-browser click "#submit-button"
agent-browser fill "input[name='email']" "user@example.com"
agent-browser get text ".header h1"
标准CSS选择器同样适用:
bash
agent-browser click "#submit-button"
agent-browser fill "input[name='email']" "user@example.com"
agent-browser get text ".header h1"

Semantic Locators

语义定位符

Find elements by their semantic meaning:
bash
undefined
通过元素的语义含义查找元素:
bash
undefined

By ARIA role

通过ARIA角色

agent-browser find role button click --name "Submit" agent-browser find role textbox fill "user@example.com" --name "Email"
agent-browser find role button click --name "Submit" agent-browser find role textbox fill "user@example.com" --name "Email"

By visible text

通过可见文本

agent-browser find text "Sign In" click agent-browser find text "Welcome" click --exact
agent-browser find text "Sign In" click agent-browser find text "Welcome" click --exact

By label

通过标签

agent-browser find label "Password" fill "secret123"
agent-browser find label "Password" fill "secret123"

By placeholder

通过占位符

agent-browser find placeholder "Search..." type "rust cli"
agent-browser find placeholder "Search..." type "rust cli"

By test ID

通过测试ID

agent-browser find testid "login-form" click
undefined
agent-browser find testid "login-form" click
undefined

Common Workflows

常见工作流

Basic Navigation and Interaction

基础导航与交互

bash
undefined
bash
undefined

Start browser and navigate

启动浏览器并导航

agent-browser open https://github.com/login
agent-browser open https://github.com/login

Get page structure

获取页面结构

agent-browser snapshot
agent-browser snapshot

Fill login form using refs from snapshot

使用快照中的引用填写登录表单

agent-browser fill @e5 "username" agent-browser fill @e6 "password" agent-browser click @e7
agent-browser fill @e5 "username" agent-browser fill @e6 "password" agent-browser click @e7

Or use semantic locators

或者使用语义定位符

agent-browser find label "Username" fill "username" agent-browser find label "Password" fill "password" agent-browser find role button click --name "Sign in"
agent-browser find label "Username" fill "username" agent-browser find label "Password" fill "password" agent-browser find role button click --name "Sign in"

Wait for navigation

等待导航完成

agent-browser wait --url "**/dashboard"
agent-browser wait --url "**/dashboard"

Take screenshot

截图

agent-browser screenshot success.png
undefined
agent-browser screenshot success.png
undefined

Form Automation

表单自动化

bash
agent-browser open https://forms.example.com
bash
agent-browser open https://forms.example.com

Fill text inputs

填写文本输入框

agent-browser fill "#name" "John Doe" agent-browser fill "#email" "john@example.com"
agent-browser fill "#name" "John Doe" agent-browser fill "#email" "john@example.com"

Select dropdown

选择下拉选项

agent-browser select "#country" "United States"
agent-browser select "#country" "United States"

Check checkboxes

勾选复选框

agent-browser check "#newsletter" agent-browser check "#terms"
agent-browser check "#newsletter" agent-browser check "#terms"

Upload files

上传文件

agent-browser upload "#resume" "/path/to/resume.pdf"
agent-browser upload "#resume" "/path/to/resume.pdf"

Submit

提交表单

agent-browser find role button click --name "Submit"
agent-browser find role button click --name "Submit"

Wait for success message

等待成功提示

agent-browser wait --text "Thank you"
undefined
agent-browser wait --text "Thank you"
undefined

Data Extraction

数据提取

bash
agent-browser open https://news.ycombinator.com
bash
agent-browser open https://news.ycombinator.com

Get page snapshot for structure

获取页面快照查看结构

agent-browser snapshot -i # Interactive mode shows full tree
agent-browser snapshot -i # 交互模式显示完整树结构

Extract specific content

提取特定内容

agent-browser get text ".titleline > a"
agent-browser get text ".titleline > a"

Get multiple elements with JavaScript

使用JavaScript获取多个元素

agent-browser eval "Array.from(document.querySelectorAll('.titleline > a')).map(a => a.textContent)"
agent-browser eval "Array.from(document.querySelectorAll('.titleline > a')).map(a => a.textContent)"

Get structured data as JSON

获取结构化JSON数据

agent-browser eval "JSON.stringify({ title: document.querySelector('title').textContent, links: Array.from(document.querySelectorAll('.titleline > a')).map(a => ({ text: a.textContent, url: a.href })) })"
undefined
agent-browser eval "JSON.stringify({ title: document.querySelector('title').textContent, links: Array.from(document.querySelectorAll('.titleline > a')).map(a => ({ text: a.textContent, url: a.href })) })"
undefined

Batch Execution

批量执行

Reduce overhead by running multiple commands in one invocation:
bash
undefined
通过一次调用运行多个命令以减少开销:
bash
undefined

Argument mode

参数模式

agent-browser batch
"open https://example.com"
"snapshot -i"
"click @e2"
"wait 1000"
"screenshot result.png"
agent-browser batch
"open https://example.com"
"snapshot -i"
"click @e2"
"wait 1000"
"screenshot result.png"

With --bail to stop on first error

使用--bail参数在首次出错时停止

agent-browser batch --bail
"open https://login.example.com"
"fill #username user@example.com"
"fill #password ${PASSWORD}"
"click #submit"
"wait --url '**/dashboard'"
agent-browser batch --bail
"open https://login.example.com"
"fill #username user@example.com"
"fill #password ${PASSWORD}"
"click #submit"
"wait --url '**/dashboard'"

JSON mode (piped from script)

JSON模式(从脚本管道输入)

echo '[ ["open", "https://example.com"], ["snapshot"], ["click", "@e1"], ["screenshot"] ]' | agent-browser batch --json
undefined
echo '[ ["open", "https://example.com"], ["snapshot"], ["click", "@e1"], ["screenshot"] ]' | agent-browser batch --json
undefined

AI Chat Mode

AI聊天模式

Natural language browser control:
bash
undefined
自然语言浏览器控制:
bash
undefined

Single-shot command

单次命令

agent-browser chat "go to github.com and search for rust browser automation"
agent-browser chat "go to github.com and search for rust browser automation"

Interactive REPL

交互式REPL

agent-browser chat
agent-browser chat

> navigate to example.com

> navigate to example.com

> click the first link

> click the first link

> take a screenshot

> take a screenshot

> exit

> exit

undefined
undefined

Network Interception

网络拦截

bash
undefined
bash
undefined

Block specific resources

拦截特定资源

agent-browser network route "**/.jpg" --abort agent-browser network route "https://ads.example.com/" --abort
agent-browser network route "**/.jpg" --abort agent-browser network route "https://ads.example.com/" --abort

Block resource types

拦截资源类型

agent-browser network route '' --abort --resource-type script agent-browser network route '' --abort --resource-type image,media
agent-browser network route '' --abort --resource-type script agent-browser network route '' --abort --resource-type image,media

Mock API responses

模拟API响应

agent-browser network route "https://api.example.com/user" --body '{ "status": "ok", "data": {"id": 1, "name": "Test User"} }'
agent-browser network route "https://api.example.com/user" --body '{ "status": "ok", "data": {"id": 1, "name": "Test User"} }'

View network requests

查看网络请求

agent-browser network requests agent-browser network requests --filter api agent-browser network requests --type fetch,xhr agent-browser network requests --method POST agent-browser network requests --status 200
agent-browser network requests agent-browser network requests --filter api agent-browser network requests --type fetch,xhr agent-browser network requests --method POST agent-browser network requests --status 200

View specific request detail

查看特定请求详情

agent-browser network request req_123abc
agent-browser network request req_123abc

HAR recording

HAR录制

agent-browser network har start
agent-browser network har start

... perform actions ...

... 执行操作 ...

agent-browser network har stop capture.har
undefined
agent-browser network har stop capture.har
undefined

Screenshot Strategies

截图策略

bash
undefined
bash
undefined

Basic screenshot (saves to temp if no path)

基础截图(无路径则保存到临时目录)

agent-browser screenshot
agent-browser screenshot

Save to specific path

保存到指定路径

agent-browser screenshot ./screenshots/page.png
agent-browser screenshot ./screenshots/page.png

Full page screenshot

全屏截图

agent-browser screenshot --full full-page.png
agent-browser screenshot --full full-page.png

Annotated with element labels

带元素标注的截图

agent-browser screenshot --annotate labeled.png
agent-browser screenshot --annotate labeled.png

Custom directory and format

自定义目录与格式

agent-browser screenshot --screenshot-dir ./shots --screenshot-format jpeg --screenshot-quality 80
agent-browser screenshot --screenshot-dir ./shots --screenshot-format jpeg --screenshot-quality 80

Element screenshot

元素截图

agent-browser eval "document.querySelector('#main').screenshot()" -b > element.png
undefined
agent-browser eval "document.querySelector('#main').screenshot()" -b > element.png
undefined

Multi-Tab Operations

多标签页操作

bash
undefined
bash
undefined

List tabs

列出标签页

agent-browser tab
agent-browser tab

Open new tab with label

打开带标签的新标签页

agent-browser tab new --label docs https://docs.example.com
agent-browser tab new --label docs https://docs.example.com

Open link in new tab

在新标签页打开链接

agent-browser click "#external-link" --new-tab
agent-browser click "#external-link" --new-tab

Switch to tab by label

通过标签切换标签页

agent-browser tab docs
agent-browser tab docs

Switch to tab by ID

通过ID切换标签页

agent-browser tab t2
agent-browser tab t2

Close tab

关闭标签页

agent-browser tab close docs agent-browser tab close t2
undefined
agent-browser tab close docs agent-browser tab close t2
undefined

Wait Strategies

等待策略

bash
undefined
bash
undefined

Wait for element to appear

等待元素出现

agent-browser wait "#result"
agent-browser wait "#result"

Wait for time (milliseconds)

等待指定时长(毫秒)

agent-browser wait 2000
agent-browser wait 2000

Wait for text (substring match)

等待文本出现(子串匹配)

agent-browser wait --text "Success"
agent-browser wait --text "Success"

Wait for URL pattern

等待URL匹配模式

agent-browser wait --url "**/dashboard"
agent-browser wait --url "**/dashboard"

Wait for network idle

等待网络空闲

agent-browser wait --load networkidle
agent-browser wait --load networkidle

Wait for JavaScript condition

等待JavaScript条件满足

agent-browser wait --fn "document.readyState === 'complete'" agent-browser wait --fn "window.dataLoaded === true"
agent-browser wait --fn "document.readyState === 'complete'" agent-browser wait --fn "window.dataLoaded === true"

Wait for element to disappear

等待元素消失

agent-browser wait --fn "!document.body.innerText.includes('Loading...')" agent-browser wait "#spinner" --state hidden
undefined
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" agent-browser wait "#spinner" --state hidden
undefined

Keyboard and Mouse Control

键盘与鼠标控制

bash
undefined
bash
undefined

Keyboard shortcuts

键盘快捷键

agent-browser press "Control+a" agent-browser press "Control+c" agent-browser press "Enter" agent-browser press "Tab"
agent-browser press "Control+a" agent-browser press "Control+c" agent-browser press "Enter" agent-browser press "Tab"

Type with real keystrokes (at current focus)

模拟真实按键输入(当前焦点处)

agent-browser keyboard type "Hello World"
agent-browser keyboard type "Hello World"

Insert text without key events

直接插入文本(无按键事件)

agent-browser keyboard inserttext "Pasted content"
agent-browser keyboard inserttext "Pasted content"

Hold and release keys

按住并释放按键

agent-browser keydown "Shift" agent-browser press "a" agent-browser keyup "Shift"
agent-browser keydown "Shift" agent-browser press "a" agent-browser keyup "Shift"

Mouse control

鼠标控制

agent-browser mouse move 100 200 agent-browser mouse down left agent-browser mouse up left agent-browser mouse wheel 100 0 # dy dx
agent-browser mouse move 100 200 agent-browser mouse down left agent-browser mouse up left agent-browser mouse wheel 100 0 # dy dx

Drag and drop

拖拽操作

agent-browser drag "#source" "#target"
undefined
agent-browser drag "#source" "#target"
undefined

Cookies and Storage

Cookie与存储

bash
undefined
bash
undefined

Cookies

Cookie管理

agent-browser cookies agent-browser cookies set "sessionId" "abc123" agent-browser cookies clear
agent-browser cookies agent-browser cookies set "sessionId" "abc123" agent-browser cookies clear

Import from cURL

从cURL导入Cookie

agent-browser cookies set --curl curl-cookies.txt
agent-browser cookies set --curl curl-cookies.txt

LocalStorage

LocalStorage管理

agent-browser storage local agent-browser storage local get "token" agent-browser storage local set "token" "eyJ..." agent-browser storage local clear
agent-browser storage local agent-browser storage local get "token" agent-browser storage local set "token" "eyJ..." agent-browser storage local clear

SessionStorage

SessionStorage管理

agent-browser storage session agent-browser storage session set "tempData" '{"key":"value"}'
undefined
agent-browser storage session agent-browser storage session set "tempData" '{"key":"value"}'
undefined

Browser Configuration

浏览器配置

bash
undefined
bash
undefined

Set viewport

设置视口

agent-browser set viewport 1920 1080 agent-browser set viewport 1920 1080 2 # Retina (2x scale)
agent-browser set viewport 1920 1080 agent-browser set viewport 1920 1080 2 # 视网膜屏(2倍缩放)

Emulate device

模拟设备

agent-browser set device "iPhone 14"
agent-browser set device "iPhone 14"

Geolocation

设置地理位置

agent-browser set geo 37.7749 -122.4194 # San Francisco
agent-browser set geo 37.7749 -122.4194 # 旧金山

Offline mode

离线模式

agent-browser set offline on agent-browser set offline off
agent-browser set offline on agent-browser set offline off

Custom headers

自定义请求头

agent-browser set headers '{"X-Custom-Header":"value"}'
agent-browser set headers '{"X-Custom-Header":"value"}'

HTTP auth

HTTP认证

agent-browser set credentials username password
agent-browser set credentials username password

Color scheme

配色方案

agent-browser set media dark agent-browser set media light
undefined
agent-browser set media dark agent-browser set media light
undefined

Advanced JavaScript Evaluation

高级JavaScript评估

bash
undefined
bash
undefined

Simple evaluation

简单评估

agent-browser eval "document.title"
agent-browser eval "document.title"

Complex extraction

复杂数据提取

agent-browser eval " JSON.stringify({ url: window.location.href, links: Array.from(document.querySelectorAll('a')).map(a => ({ text: a.textContent.trim(), href: a.href })).filter(l => l.text) }) "
agent-browser eval " JSON.stringify({ url: window.location.href, links: Array.from(document.querySelectorAll('a')).map(a => ({ text: a.textContent.trim(), href: a.href })).filter(l => l.text) }) "

Binary output (base64)

二进制输出(base64)

agent-browser eval "document.querySelector('canvas').toDataURL()" -b > canvas.png
agent-browser eval "document.querySelector('canvas').toDataURL()" -b > canvas.png

From stdin

从标准输入获取代码

echo "document.body.innerHTML" | agent-browser eval --stdin
undefined
echo "document.body.innerHTML" | agent-browser eval --stdin
undefined

Automation Script Example

自动化脚本示例

Create a reusable automation script:
bash
#!/bin/bash
创建可复用的自动化脚本:
bash
#!/bin/bash

login-and-extract.sh

login-and-extract.sh

set -e
SESSION_FILE=".browser-session"
set -e
SESSION_FILE=".browser-session"

Start browser

启动浏览器

Login

登录

agent-browser find label "Email" fill "${USER_EMAIL}" agent-browser find label "Password" fill "${USER_PASSWORD}" agent-browser find role button click --name "Sign in"
agent-browser find label "Email" fill "${USER_EMAIL}" agent-browser find label "Password" fill "${USER_PASSWORD}" agent-browser find role button click --name "Sign in"

Wait for dashboard

等待仪表盘加载

agent-browser wait --url "**/dashboard" agent-browser wait 1000
agent-browser wait --url "**/dashboard" agent-browser wait 1000

Extract data

提取数据

DATA=$(agent-browser eval " JSON.stringify({ user: document.querySelector('.user-name').textContent, notifications: Array.from(document.querySelectorAll('.notification')).map(n => n.textContent) }) ")
echo "$DATA" | jq .
DATA=$(agent-browser eval " JSON.stringify({ user: document.querySelector('.user-name').textContent, notifications: Array.from(document.querySelectorAll('.notification')).map(n => n.textContent) }) ")
echo "$DATA" | jq .

Screenshot

截图

agent-browser screenshot dashboard.png
agent-browser screenshot dashboard.png

Cleanup

清理会话

agent-browser close

Usage:

```bash
export USER_EMAIL="user@example.com"
export USER_PASSWORD="secret"
chmod +x login-and-extract.sh
./login-and-extract.sh
agent-browser close

使用方式:

```bash
export USER_EMAIL="user@example.com"
export USER_PASSWORD="secret"
chmod +x login-and-extract.sh
./login-and-extract.sh

Batch Automation Pattern

批量自动化模式

For complex multi-step workflows, use batch mode with a script:
javascript
#!/usr/bin/env node
// automation.js

const commands = [
  ["open", "https://example.com"],
  ["snapshot"],
  // Process snapshot, determine next steps...
  ["click", "@e5"],
  ["wait", "1000"],
  ["screenshot", "step1.png"],
  ["find", "role", "button", "click", "--name", "Next"],
  ["wait", "--url", "**/step2"],
  ["screenshot", "step2.png"]
];

console.log(JSON.stringify(commands));
Run with:
bash
node automation.js | agent-browser batch --json --bail
针对复杂的多步骤工作流,使用批量模式配合脚本:
javascript
#!/usr/bin/env node
// automation.js

const commands = [
  ["open", "https://example.com"],
  ["snapshot"],
  // 处理快照,确定下一步操作...
  ["click", "@e5"],
  ["wait", "1000"],
  ["screenshot", "step1.png"],
  ["find", "role", "button", "click", "--name", "Next"],
  ["wait", "--url", "**/step2"],
  ["screenshot", "step2.png"]
];

console.log(JSON.stringify(commands));
运行方式:
bash
node automation.js | agent-browser batch --json --bail

CDP Streaming for Real-Time Monitoring

CDP流处理用于实时监控

Connect to Chrome DevTools Protocol for event streaming:
bash
undefined
连接到Chrome DevTools Protocol进行事件流处理:
bash
undefined

Get CDP WebSocket URL

获取CDP WebSocket地址

CDP_URL=$(agent-browser get cdp-url) echo "CDP URL: $CDP_URL"
CDP_URL=$(agent-browser get cdp-url) echo "CDP URL: $CDP_URL"

Or connect directly to CDP port

或直接连接到CDP端口

agent-browser connect 9222
agent-browser connect 9222

Enable runtime streaming

启用运行时流处理

agent-browser stream enable --port 9223 agent-browser stream status
agent-browser stream enable --port 9223 agent-browser stream status

... perform actions, stream events to ws://localhost:9223 ...

... 执行操作,事件将流到ws://localhost:9223 ...

agent-browser stream disable
undefined
agent-browser stream disable
undefined

Troubleshooting

故障排除

Browser Not Found

浏览器未找到

bash
undefined
bash
undefined

Re-download Chrome for Testing

重新下载Chrome for Testing

agent-browser install
agent-browser install

Or specify custom Chrome path

或指定自定义Chrome路径

export CHROME_PATH=/path/to/chrome agent-browser open
undefined
export CHROME_PATH=/path/to/chrome agent-browser open
undefined

Elements Not Found

元素未找到

bash
undefined
bash
undefined

Use snapshot to see available refs

使用快照查看可用引用

agent-browser snapshot -i
agent-browser snapshot -i

Wait for element before interacting

等待元素加载后再交互

agent-browser wait "#dynamic-element" agent-browser click "#dynamic-element"
agent-browser wait "#dynamic-element" agent-browser click "#dynamic-element"

Use semantic selectors for robustness

使用语义选择器提高鲁棒性

agent-browser find role button click --name "Submit"
undefined
agent-browser find role button click --name "Submit"
undefined

Session Management

会话管理

bash
undefined
bash
undefined

List all active sessions

列出所有活跃会话

agent-browser tab
agent-browser tab

Close specific session

关闭特定会话

agent-browser close
agent-browser close

Close all sessions

关闭所有会话

agent-browser close --all
undefined
agent-browser close --all
undefined

Debugging

调试

bash
undefined
bash
undefined

Get CDP URL for DevTools connection

获取用于连接DevTools的CDP地址

agent-browser get cdp-url
agent-browser get cdp-url

View network activity

查看网络活动

agent-browser network requests
agent-browser network requests

Enable HAR recording

启用HAR录制

agent-browser network har start
agent-browser network har start

... actions ...

... 执行操作 ...

agent-browser network har stop debug.har
undefined
agent-browser network har stop debug.har
undefined

Clipboard Access Issues

剪贴板访问问题

On Linux, clipboard requires
xclip
or
xsel
:
bash
sudo apt-get install xclip
在Linux系统中,剪贴板功能需要
xclip
xsel
bash
sudo apt-get install xclip

or

sudo apt-get install xsel
undefined
sudo apt-get install xsel
undefined

Environment Variables

环境变量

bash
undefined
bash
undefined

Custom Chrome binary

自定义Chrome二进制文件路径

export CHROME_PATH=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser
export CHROME_PATH=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser

Custom user data directory

自定义用户数据目录

export CHROME_USER_DATA_DIR=~/.config/agent-browser
export CHROME_USER_DATA_DIR=~/.config/agent-browser

Headless mode (default)

无头模式(默认开启)

export CHROME_HEADLESS=true
export CHROME_HEADLESS=true

Show browser UI

显示浏览器界面

export CHROME_HEADLESS=false
undefined
export CHROME_HEADLESS=false
undefined

Best Practices for AI Agents

AI Agent最佳实践

  1. Use snapshots first: Always get
    agent-browser snapshot
    to understand page structure before acting
  2. Prefer refs over selectors: Use
    @e1
    style refs from snapshots for stability
  3. Use semantic locators:
    find role button --name "Submit"
    is more resilient than CSS selectors
  4. Wait strategically: Add
    wait
    commands for dynamic content before interacting
  5. Batch when possible: Use
    batch
    mode to reduce per-command overhead
  6. Handle errors: Use
    --bail
    in batch mode to stop on first error
  7. Clean up: Always
    close
    sessions when done
  8. Screenshot for verification: Take screenshots at key steps for debugging
  9. Use labels for tabs: Assign meaningful labels when opening multiple tabs
  1. 先使用快照:在执行操作前,始终运行
    agent-browser snapshot
    了解页面结构
  2. 优先使用引用而非选择器:使用快照中的
    @e1
    风格引用以保证稳定性
  3. 使用语义定位符
    find role button --name "Submit"
    比CSS选择器更具韧性
  4. 合理使用等待:在交互动态内容前添加
    wait
    命令
  5. 尽可能使用批量模式:使用
    batch
    模式减少单命令开销
  6. 处理错误:在批量模式中使用
    --bail
    参数在首次出错时停止
  7. 清理会话:完成操作后务必执行
    close
    关闭会话
  8. 截图验证:在关键步骤截图用于调试
  9. 为标签页添加标签:打开多个标签页时分配有意义的标签

Quick Reference

快速参考

bash
undefined
bash
undefined

Essential commands for AI agents

AI Agent核心命令

agent-browser open <url> # Start agent-browser snapshot # Get structure with refs agent-browser click @eN # Interact with ref agent-browser find role button click --name "Text" # Semantic agent-browser wait --text "Success" # Wait for content agent-browser screenshot result.png # Verify agent-browser get text @eN # Extract agent-browser eval "JS" # Complex extraction agent-browser batch "cmd1" "cmd2" # Multi-step agent-browser close # Cleanup
undefined
agent-browser open <url> # 启动浏览器 agent-browser snapshot # 获取带引用的页面结构 agent-browser click @eN # 通过引用交互元素 agent-browser find role button click --name "Text" # 语义定位交互 agent-browser wait --text "Success" # 等待内容出现 agent-browser screenshot result.png # 截图验证 agent-browser get text @eN # 提取文本 agent-browser eval "JS" # 复杂数据提取 agent-browser batch "cmd1" "cmd2" # 多步骤批量执行 agent-browser close # 清理会话
undefined