Vercel Agent Browser Skill

Skill by ara.so — AI Agent Skills collection.

由ara.so开发的Skill——AI Agent技能合集。

Overview

概述

agent-browser

is a fast native Rust CLI for browser automation designed specifically for AI agents. It provides an accessibility-first approach with semantic selectors and ref-based element targeting, making it ideal for LLM-driven web automation.

Key Features:

Native Rust performance with npm/Homebrew distribution
Accessibility tree snapshots with stable element refs (
```
@e1
```
,
```
@e2
```
, etc.)
Semantic locators (role, text, label, placeholder)
AI chat mode for natural language control
Batch command execution
Network interception and HAR recording
Chrome DevTools Protocol (CDP) streaming

agent-browser

是一款专为AI Agent设计的快速原生Rust CLI浏览器自动化工具。它采用无障碍优先的方法，提供语义选择器和基于引用的元素定位，非常适合LLM驱动的Web自动化。

核心特性：

原生Rust性能，支持npm/Homebrew分发
带有稳定元素引用（
```
@e1
```
、
```
@e2
```
等）的无障碍树快照
语义定位符（角色、文本、标签、占位符）
自然语言控制的AI聊天模式
批量命令执行
网络拦截与HAR录制
Chrome DevTools Protocol (CDP) 流处理

Installation

安装

Global (Recommended)

全局安装（推荐）

bash

npm install -g agent-browser
agent-browser install  # Downloads Chrome for Testing

bash

npm install -g agent-browser
agent-browser install  # 下载Chrome for Testing

Project Local

项目本地安装

bash

npm install agent-browser
npx agent-browser install

bash

npm install agent-browser
npx agent-browser install

Alternative Methods

其他安装方式

bash

undefined

bash

undefined

Homebrew (macOS)

Homebrew（macOS）

brew install agent-browser agent-browser install

Cargo (Rust)

Cargo（Rust）

cargo install agent-browser agent-browser install

Linux with system dependencies

带系统依赖的Linux安装

agent-browser install --with-deps

undefined

agent-browser install --with-deps

undefined

Upgrading

升级

bash

agent-browser upgrade  # Auto-detects installation method

bash

agent-browser upgrade  # 自动检测安装方式

Core Concepts

核心概念

Element References (@refs)

元素引用（@refs）

The most powerful feature for AI agents is the accessibility snapshot with stable refs:

bash

undefined

对AI Agent而言最强大的功能是带有稳定引用的无障碍快照：

bash

undefined

Get accessibility tree with element refs

获取带元素引用的无障碍树

agent-browser open https://example.com agent-browser snapshot

Output includes refs like:

输出包含如下引用：

@e1 heading "Example Domain"

@e2 link "More information..."

@e3 textbox "Search" (placeholder)

Use refs directly in commands

直接在命令中使用引用

agent-browser click @e2 agent-browser fill @e3 "search query" agent-browser get text @e1

undefined

agent-browser click @e2 agent-browser fill @e3 "search query" agent-browser get text @e1

undefined

Traditional Selectors

传统选择器

Standard CSS selectors also work:

bash

agent-browser click "#submit-button"
agent-browser fill "input[name='email']" "user@example.com"
agent-browser get text ".header h1"

标准CSS选择器同样适用：

bash

agent-browser click "#submit-button"
agent-browser fill "input[name='email']" "user@example.com"
agent-browser get text ".header h1"

Semantic Locators

语义定位符

Find elements by their semantic meaning:

bash

undefined

通过元素的语义含义查找元素：

bash

undefined

By ARIA role

通过ARIA角色

agent-browser find role button click --name "Submit" agent-browser find role textbox fill "user@example.com" --name "Email"

By visible text

通过可见文本

agent-browser find text "Sign In" click agent-browser find text "Welcome" click --exact

By label

通过标签

agent-browser find label "Password" fill "secret123"

By placeholder

通过占位符

agent-browser find placeholder "Search..." type "rust cli"

By test ID

通过测试ID

agent-browser find testid "login-form" click

undefined

agent-browser find testid "login-form" click

undefined

Common Workflows

常见工作流

Basic Navigation and Interaction

基础导航与交互

bash

undefined

bash

undefined

Start browser and navigate

启动浏览器并导航

agent-browser open https://github.com/login

Get page structure

获取页面结构

agent-browser snapshot

Fill login form using refs from snapshot

使用快照中的引用填写登录表单

agent-browser fill @e5 "username" agent-browser fill @e6 "password" agent-browser click @e7

Or use semantic locators

或者使用语义定位符

agent-browser find label "Username" fill "username" agent-browser find label "Password" fill "password" agent-browser find role button click --name "Sign in"

Wait for navigation

等待导航完成

agent-browser wait --url "**/dashboard"

Take screenshot

截图

agent-browser screenshot success.png

undefined

agent-browser screenshot success.png

undefined

Form Automation

表单自动化

bash

agent-browser open https://forms.example.com

bash

agent-browser open https://forms.example.com

Fill text inputs

填写文本输入框

agent-browser fill "#name" "John Doe" agent-browser fill "#email" "john@example.com"

Select dropdown

选择下拉选项

agent-browser select "#country" "United States"

Check checkboxes

勾选复选框

agent-browser check "#newsletter" agent-browser check "#terms"

Upload files

上传文件

agent-browser upload "#resume" "/path/to/resume.pdf"

Submit

提交表单

agent-browser find role button click --name "Submit"

Wait for success message

等待成功提示

agent-browser wait --text "Thank you"

undefined

agent-browser wait --text "Thank you"

undefined

Data Extraction

数据提取

bash

agent-browser open https://news.ycombinator.com

bash

agent-browser open https://news.ycombinator.com

Get page snapshot for structure

获取页面快照查看结构

agent-browser snapshot -i # Interactive mode shows full tree

agent-browser snapshot -i # 交互模式显示完整树结构

Extract specific content

提取特定内容

agent-browser get text ".titleline > a"

Get multiple elements with JavaScript

使用JavaScript获取多个元素

agent-browser eval "Array.from(document.querySelectorAll('.titleline > a')).map(a => a.textContent)"

Get structured data as JSON

获取结构化JSON数据

agent-browser eval "JSON.stringify({ title: document.querySelector('title').textContent, links: Array.from(document.querySelectorAll('.titleline > a')).map(a => ({ text: a.textContent, url: a.href })) })"

undefined

agent-browser eval "JSON.stringify({ title: document.querySelector('title').textContent, links: Array.from(document.querySelectorAll('.titleline > a')).map(a => ({ text: a.textContent, url: a.href })) })"

undefined

Batch Execution

批量执行

Reduce overhead by running multiple commands in one invocation:

bash

undefined

通过一次调用运行多个命令以减少开销：

bash

undefined

Argument mode

参数模式

agent-browser batch
"open https://example.com"
"snapshot -i"
"click @e2"
"wait 1000"
"screenshot result.png"

With --bail to stop on first error

使用--bail参数在首次出错时停止

agent-browser batch --bail
"open https://login.example.com"
"fill #username user@example.com"
"fill #password ${PASSWORD}"
"click #submit"
"wait --url '**/dashboard'"

JSON mode (piped from script)

JSON模式（从脚本管道输入）

echo '[ ["open", "https://example.com"], ["snapshot"], ["click", "@e1"], ["screenshot"] ]' | agent-browser batch --json

undefined

echo '[ ["open", "https://example.com"], ["snapshot"], ["click", "@e1"], ["screenshot"] ]' | agent-browser batch --json

undefined

AI Chat Mode

AI聊天模式

Natural language browser control:

bash

undefined

自然语言浏览器控制：

bash

undefined

Single-shot command

单次命令

agent-browser chat "go to github.com and search for rust browser automation"

Interactive REPL

交互式REPL

agent-browser chat

> navigate to example.com

> click the first link

> take a screenshot

> exit

undefined

undefined

Network Interception

网络拦截

bash

undefined

bash

undefined

Block specific resources

拦截特定资源

agent-browser network route "**/.jpg" --abort agent-browser network route "https://ads.example.com/" --abort

Block resource types

拦截资源类型

agent-browser network route '' --abort --resource-type script agent-browser network route '' --abort --resource-type image,media

Mock API responses

模拟API响应

agent-browser network route "https://api.example.com/user" --body '{ "status": "ok", "data": {"id": 1, "name": "Test User"} }'

View network requests

查看网络请求

agent-browser network requests agent-browser network requests --filter api agent-browser network requests --type fetch,xhr agent-browser network requests --method POST agent-browser network requests --status 200

View specific request detail

查看特定请求详情

agent-browser network request req_123abc

HAR recording

HAR录制

agent-browser network har start

... perform actions ...

... 执行操作 ...

agent-browser network har stop capture.har

undefined

agent-browser network har stop capture.har

undefined

Screenshot Strategies

截图策略

bash

undefined

bash

undefined

Basic screenshot (saves to temp if no path)

基础截图（无路径则保存到临时目录）

agent-browser screenshot

Save to specific path

保存到指定路径

agent-browser screenshot ./screenshots/page.png

Full page screenshot

全屏截图

agent-browser screenshot --full full-page.png

Annotated with element labels

带元素标注的截图

agent-browser screenshot --annotate labeled.png

Custom directory and format

自定义目录与格式

agent-browser screenshot --screenshot-dir ./shots --screenshot-format jpeg --screenshot-quality 80

Element screenshot

元素截图

agent-browser eval "document.querySelector('#main').screenshot()" -b > element.png

undefined

agent-browser eval "document.querySelector('#main').screenshot()" -b > element.png

undefined

Multi-Tab Operations

多标签页操作

bash

undefined

bash

undefined

List tabs

列出标签页

agent-browser tab

Open new tab with label

打开带标签的新标签页

agent-browser tab new --label docs https://docs.example.com

Open link in new tab

在新标签页打开链接

agent-browser click "#external-link" --new-tab

Switch to tab by label

通过标签切换标签页

agent-browser tab docs

Switch to tab by ID

通过ID切换标签页

agent-browser tab t2

Close tab

关闭标签页

agent-browser tab close docs agent-browser tab close t2

undefined

agent-browser tab close docs agent-browser tab close t2

undefined

Wait Strategies

等待策略

bash

undefined

bash

undefined

Wait for element to appear

等待元素出现

agent-browser wait "#result"

Wait for time (milliseconds)

等待指定时长（毫秒）

agent-browser wait 2000

Wait for text (substring match)

等待文本出现（子串匹配）

agent-browser wait --text "Success"

Wait for URL pattern

等待URL匹配模式

agent-browser wait --url "**/dashboard"

Wait for network idle

等待网络空闲

agent-browser wait --load networkidle

Wait for JavaScript condition

等待JavaScript条件满足

agent-browser wait --fn "document.readyState === 'complete'" agent-browser wait --fn "window.dataLoaded === true"

Wait for element to disappear

等待元素消失

agent-browser wait --fn "!document.body.innerText.includes('Loading...')" agent-browser wait "#spinner" --state hidden

undefined

agent-browser wait --fn "!document.body.innerText.includes('Loading...')" agent-browser wait "#spinner" --state hidden

undefined

Keyboard and Mouse Control

键盘与鼠标控制

bash

undefined

bash

undefined

Keyboard shortcuts

键盘快捷键

agent-browser press "Control+a" agent-browser press "Control+c" agent-browser press "Enter" agent-browser press "Tab"

Type with real keystrokes (at current focus)

模拟真实按键输入（当前焦点处）

agent-browser keyboard type "Hello World"

Insert text without key events

直接插入文本（无按键事件）

agent-browser keyboard inserttext "Pasted content"

Hold and release keys

按住并释放按键

agent-browser keydown "Shift" agent-browser press "a" agent-browser keyup "Shift"

Mouse control

鼠标控制

agent-browser mouse move 100 200 agent-browser mouse down left agent-browser mouse up left agent-browser mouse wheel 100 0 # dy dx

Drag and drop

拖拽操作

agent-browser drag "#source" "#target"

undefined

agent-browser drag "#source" "#target"

undefined

Cookies and Storage

Cookie与存储

bash

undefined

bash

undefined

Cookies

Cookie管理

agent-browser cookies agent-browser cookies set "sessionId" "abc123" agent-browser cookies clear

Import from cURL

从cURL导入Cookie

agent-browser cookies set --curl curl-cookies.txt

LocalStorage

LocalStorage管理

agent-browser storage local agent-browser storage local get "token" agent-browser storage local set "token" "eyJ..." agent-browser storage local clear

SessionStorage

SessionStorage管理

agent-browser storage session agent-browser storage session set "tempData" '{"key":"value"}'

undefined

agent-browser storage session agent-browser storage session set "tempData" '{"key":"value"}'

undefined

Browser Configuration

浏览器配置

bash

undefined

bash

undefined

Set viewport

设置视口

agent-browser set viewport 1920 1080 agent-browser set viewport 1920 1080 2 # Retina (2x scale)

agent-browser set viewport 1920 1080 agent-browser set viewport 1920 1080 2 # 视网膜屏（2倍缩放）

Emulate device

模拟设备

agent-browser set device "iPhone 14"

Geolocation

设置地理位置

agent-browser set geo 37.7749 -122.4194 # San Francisco

agent-browser set geo 37.7749 -122.4194 # 旧金山

Offline mode

离线模式

agent-browser set offline on agent-browser set offline off

Custom headers

自定义请求头

agent-browser set headers '{"X-Custom-Header":"value"}'

HTTP auth

HTTP认证

agent-browser set credentials username password

Color scheme

配色方案

agent-browser set media dark agent-browser set media light

undefined

agent-browser set media dark agent-browser set media light

undefined

Advanced JavaScript Evaluation

高级JavaScript评估

bash

undefined

bash

undefined

Simple evaluation

简单评估

agent-browser eval "document.title"

Complex extraction

复杂数据提取

agent-browser eval " JSON.stringify({ url: window.location.href, links: Array.from(document.querySelectorAll('a')).map(a => ({ text: a.textContent.trim(), href: a.href })).filter(l => l.text) }) "

Binary output (base64)

二进制输出（base64）

agent-browser eval "document.querySelector('canvas').toDataURL()" -b > canvas.png

From stdin

从标准输入获取代码

echo "document.body.innerHTML" | agent-browser eval --stdin

undefined

echo "document.body.innerHTML" | agent-browser eval --stdin

undefined

Automation Script Example

自动化脚本示例

Create a reusable automation script:

bash

#!/bin/bash

创建可复用的自动化脚本：

bash

#!/bin/bash

login-and-extract.sh

set -e

SESSION_FILE=".browser-session"

set -e

SESSION_FILE=".browser-session"

Start browser

启动浏览器

agent-browser open https://app.example.com/login

Login

Wait for dashboard

等待仪表盘加载

agent-browser wait --url "**/dashboard" agent-browser wait 1000

Extract data

提取数据

DATA=$(agent-browser eval " JSON.stringify({ user: document.querySelector('.user-name').textContent, notifications: Array.from(document.querySelectorAll('.notification')).map(n => n.textContent) }) ")

echo "$DATA" | jq .

DATA=$(agent-browser eval " JSON.stringify({ user: document.querySelector('.user-name').textContent, notifications: Array.from(document.querySelectorAll('.notification')).map(n => n.textContent) }) ")

echo "$DATA" | jq .

Screenshot

截图

agent-browser screenshot dashboard.png

Cleanup

清理会话

agent-browser close


Usage:

```bash
export USER_EMAIL="user@example.com"
export USER_PASSWORD="secret"
chmod +x login-and-extract.sh
./login-and-extract.sh

agent-browser close


使用方式：

```bash
export USER_EMAIL="user@example.com"
export USER_PASSWORD="secret"
chmod +x login-and-extract.sh
./login-and-extract.sh

Batch Automation Pattern

批量自动化模式

For complex multi-step workflows, use batch mode with a script:

javascript

#!/usr/bin/env node
// automation.js

const commands = [
  ["open", "https://example.com"],
  ["snapshot"],
  // Process snapshot, determine next steps...
  ["click", "@e5"],
  ["wait", "1000"],
  ["screenshot", "step1.png"],
  ["find", "role", "button", "click", "--name", "Next"],
  ["wait", "--url", "**/step2"],
  ["screenshot", "step2.png"]
];

console.log(JSON.stringify(commands));

Run with:

bash

node automation.js | agent-browser batch --json --bail

针对复杂的多步骤工作流，使用批量模式配合脚本：

javascript

#!/usr/bin/env node
// automation.js

const commands = [
  ["open", "https://example.com"],
  ["snapshot"],
  // 处理快照，确定下一步操作...
  ["click", "@e5"],
  ["wait", "1000"],
  ["screenshot", "step1.png"],
  ["find", "role", "button", "click", "--name", "Next"],
  ["wait", "--url", "**/step2"],
  ["screenshot", "step2.png"]
];

console.log(JSON.stringify(commands));

运行方式：

bash

node automation.js | agent-browser batch --json --bail

CDP Streaming for Real-Time Monitoring

CDP流处理用于实时监控

Connect to Chrome DevTools Protocol for event streaming:

bash

undefined

连接到Chrome DevTools Protocol进行事件流处理：

bash

undefined

Get CDP WebSocket URL

获取CDP WebSocket地址

CDP_URL=$(agent-browser get cdp-url) echo "CDP URL: $CDP_URL"

Or connect directly to CDP port

或直接连接到CDP端口

agent-browser connect 9222

Enable runtime streaming

启用运行时流处理

agent-browser stream enable --port 9223 agent-browser stream status

... perform actions, stream events to ws://localhost:9223 ...

... 执行操作，事件将流到ws://localhost:9223 ...

agent-browser stream disable

undefined

agent-browser stream disable

undefined

Troubleshooting

故障排除

Browser Not Found

浏览器未找到

bash

undefined

bash

undefined

Re-download Chrome for Testing

重新下载Chrome for Testing

agent-browser install

Or specify custom Chrome path

或指定自定义Chrome路径

export CHROME_PATH=/path/to/chrome agent-browser open

undefined

export CHROME_PATH=/path/to/chrome agent-browser open

undefined

Elements Not Found

元素未找到

bash

undefined

bash

undefined

Use snapshot to see available refs

使用快照查看可用引用

agent-browser snapshot -i

Wait for element before interacting

等待元素加载后再交互

agent-browser wait "#dynamic-element" agent-browser click "#dynamic-element"

Use semantic selectors for robustness

使用语义选择器提高鲁棒性

agent-browser find role button click --name "Submit"

undefined

agent-browser find role button click --name "Submit"

undefined

Session Management

会话管理

bash

undefined

bash

undefined

List all active sessions

列出所有活跃会话

agent-browser tab

Close specific session

关闭特定会话

agent-browser close

Close all sessions

关闭所有会话

agent-browser close --all

undefined

agent-browser close --all

undefined

Debugging

调试

bash

undefined

bash

undefined

Get CDP URL for DevTools connection

获取用于连接DevTools的CDP地址

agent-browser get cdp-url

View network activity

查看网络活动

agent-browser network requests

Enable HAR recording

启用HAR录制

agent-browser network har start

... actions ...

... 执行操作 ...

agent-browser network har stop debug.har

undefined

agent-browser network har stop debug.har

undefined

Clipboard Access Issues

剪贴板访问问题

On Linux, clipboard requires

xclip

or

xsel

:

bash

sudo apt-get install xclip

在Linux系统中，剪贴板功能需要

xclip

或

xsel

：

bash

sudo apt-get install xclip

or

或

sudo apt-get install xsel

undefined

sudo apt-get install xsel

undefined

Environment Variables

环境变量

bash

undefined

bash

undefined

Custom Chrome binary

自定义Chrome二进制文件路径

export CHROME_PATH=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser

Custom user data directory

自定义用户数据目录

export CHROME_USER_DATA_DIR=~/.config/agent-browser

Headless mode (default)

无头模式（默认开启）

export CHROME_HEADLESS=true

Show browser UI

显示浏览器界面

export CHROME_HEADLESS=false

undefined

export CHROME_HEADLESS=false

undefined

Best Practices for AI Agents

AI Agent最佳实践

Use snapshots first: Always get
```
agent-browser snapshot
```
to understand page structure before acting
Prefer refs over selectors: Use
```
@e1
```
style refs from snapshots for stability
Use semantic locators:
```
find role button --name "Submit"
```
is more resilient than CSS selectors
Wait strategically: Add
```
wait
```
commands for dynamic content before interacting
Batch when possible: Use
```
batch
```
mode to reduce per-command overhead
Handle errors: Use
```
--bail
```
in batch mode to stop on first error
Clean up: Always
```
close
```
sessions when done
Screenshot for verification: Take screenshots at key steps for debugging
Use labels for tabs: Assign meaningful labels when opening multiple tabs

先使用快照：在执行操作前，始终运行
```
agent-browser snapshot
```
了解页面结构
优先使用引用而非选择器：使用快照中的
```
@e1
```
风格引用以保证稳定性
使用语义定位符：
```
find role button --name "Submit"
```
比CSS选择器更具韧性
合理使用等待：在交互动态内容前添加
```
wait
```
命令
尽可能使用批量模式：使用
```
batch
```
模式减少单命令开销
处理错误：在批量模式中使用
```
--bail
```
参数在首次出错时停止
清理会话：完成操作后务必执行
```
close
```
关闭会话
截图验证：在关键步骤截图用于调试
为标签页添加标签：打开多个标签页时分配有意义的标签

Quick Reference

快速参考

bash

undefined

bash

undefined

Essential commands for AI agents

AI Agent核心命令

agent-browser open <url> # Start agent-browser snapshot # Get structure with refs agent-browser click @eN # Interact with ref agent-browser find role button click --name "Text" # Semantic agent-browser wait --text "Success" # Wait for content agent-browser screenshot result.png # Verify agent-browser get text @eN # Extract agent-browser eval "JS" # Complex extraction agent-browser batch "cmd1" "cmd2" # Multi-step agent-browser close # Cleanup

undefined

agent-browser open <url> # 启动浏览器 agent-browser snapshot # 获取带引用的页面结构 agent-browser click @eN # 通过引用交互元素 agent-browser find role button click --name "Text" # 语义定位交互 agent-browser wait --text "Success" # 等待内容出现 agent-browser screenshot result.png # 截图验证 agent-browser get text @eN # 提取文本 agent-browser eval "JS" # 复杂数据提取 agent-browser batch "cmd1" "cmd2" # 多步骤批量执行 agent-browser close # 清理会话

undefined

vercel-agent-browser

Original

Translation

Vercel Agent Browser Skill

Vercel Agent Browser Skill

Overview

概述

Installation

安装

Global (Recommended)

全局安装（推荐）

Project Local

项目本地安装

Alternative Methods

其他安装方式

Homebrew (macOS)

Homebrew（macOS）

Cargo (Rust)

Cargo（Rust）

Linux with system dependencies

带系统依赖的Linux安装

Upgrading

升级

Core Concepts

核心概念

Element References (@refs)

元素引用（@refs）

Get accessibility tree with element refs

获取带元素引用的无障碍树

Output includes refs like:

输出包含如下引用：

@e1 heading "Example Domain"

@e1 heading "Example Domain"

@e2 link "More information..."

@e2 link "More information..."

@e3 textbox "Search" (placeholder)

@e3 textbox "Search" (placeholder)

Use refs directly in commands

直接在命令中使用引用

Traditional Selectors

传统选择器

Semantic Locators

语义定位符

By ARIA role

通过ARIA角色

By visible text

通过可见文本

By label

通过标签

By placeholder

通过占位符

By test ID

通过测试ID

Common Workflows

常见工作流

Basic Navigation and Interaction

基础导航与交互

Start browser and navigate

启动浏览器并导航

Get page structure

获取页面结构

Fill login form using refs from snapshot

使用快照中的引用填写登录表单

Or use semantic locators

或者使用语义定位符

Wait for navigation

等待导航完成

Take screenshot

截图

Form Automation

表单自动化

Fill text inputs

填写文本输入框

Select dropdown

选择下拉选项

Check checkboxes

勾选复选框

Upload files

上传文件

Submit