Vercel Agent Browser Skill

Skill by ara.so — AI Agent Skills collection.

Overview

agent-browser

is a fast native Rust CLI for browser automation designed specifically for AI agents. It provides an accessibility-first approach with semantic selectors and ref-based element targeting, making it ideal for LLM-driven web automation.

Key Features:

Native Rust performance with npm/Homebrew distribution
Accessibility tree snapshots with stable element refs (
```
@e1
```
,
```
@e2
```
, etc.)
Semantic locators (role, text, label, placeholder)
AI chat mode for natural language control
Batch command execution
Network interception and HAR recording
Chrome DevTools Protocol (CDP) streaming

Installation

Global (Recommended)

bash

npm install -g agent-browser
agent-browser install  # Downloads Chrome for Testing

Project Local

bash

npm install agent-browser
npx agent-browser install

Alternative Methods

bash

# Homebrew (macOS)
brew install agent-browser
agent-browser install

# Cargo (Rust)
cargo install agent-browser
agent-browser install

# Linux with system dependencies
agent-browser install --with-deps

Upgrading

bash

agent-browser upgrade  # Auto-detects installation method

Core Concepts

Element References (@refs)

The most powerful feature for AI agents is the accessibility snapshot with stable refs:

bash

# Get accessibility tree with element refs
agent-browser open https://example.com
agent-browser snapshot

# Output includes refs like:
# @e1 heading "Example Domain"
# @e2 link "More information..."
# @e3 textbox "Search" (placeholder)

# Use refs directly in commands
agent-browser click @e2
agent-browser fill @e3 "search query"
agent-browser get text @e1

Traditional Selectors

Standard CSS selectors also work:

bash

agent-browser click "#submit-button"
agent-browser fill "input[name='email']" "user@example.com"
agent-browser get text ".header h1"

Semantic Locators

Find elements by their semantic meaning:

bash

# By ARIA role
agent-browser find role button click --name "Submit"
agent-browser find role textbox fill "user@example.com" --name "Email"

# By visible text
agent-browser find text "Sign In" click
agent-browser find text "Welcome" click --exact

# By label
agent-browser find label "Password" fill "secret123"

# By placeholder
agent-browser find placeholder "Search..." type "rust cli"

# By test ID
agent-browser find testid "login-form" click

Common Workflows

Basic Navigation and Interaction

bash

# Start browser and navigate
agent-browser open https://github.com/login

# Get page structure
agent-browser snapshot

# Fill login form using refs from snapshot
agent-browser fill @e5 "username"
agent-browser fill @e6 "password"
agent-browser click @e7

# Or use semantic locators
agent-browser find label "Username" fill "username"
agent-browser find label "Password" fill "password"
agent-browser find role button click --name "Sign in"

# Wait for navigation
agent-browser wait --url "**/dashboard"

# Take screenshot
agent-browser screenshot success.png

Form Automation

bash

agent-browser open https://forms.example.com

# Fill text inputs
agent-browser fill "#name" "John Doe"
agent-browser fill "#email" "john@example.com"

# Select dropdown
agent-browser select "#country" "United States"

# Check checkboxes
agent-browser check "#newsletter"
agent-browser check "#terms"

# Upload files
agent-browser upload "#resume" "/path/to/resume.pdf"

# Submit
agent-browser find role button click --name "Submit"

# Wait for success message
agent-browser wait --text "Thank you"

Data Extraction

bash

agent-browser open https://news.ycombinator.com

# Get page snapshot for structure
agent-browser snapshot -i  # Interactive mode shows full tree

# Extract specific content
agent-browser get text ".titleline > a"

# Get multiple elements with JavaScript
agent-browser eval "Array.from(document.querySelectorAll('.titleline > a')).map(a => a.textContent)"

# Get structured data as JSON
agent-browser eval "JSON.stringify({
  title: document.querySelector('title').textContent,
  links: Array.from(document.querySelectorAll('.titleline > a')).map(a => ({
    text: a.textContent,
    url: a.href
  }))
})"

Batch Execution

Reduce overhead by running multiple commands in one invocation:

bash

# Argument mode
agent-browser batch \
  "open https://example.com" \
  "snapshot -i" \
  "click @e2" \
  "wait 1000" \
  "screenshot result.png"

# With --bail to stop on first error
agent-browser batch --bail \
  "open https://login.example.com" \
  "fill #username user@example.com" \
  "fill #password ${PASSWORD}" \
  "click #submit" \
  "wait --url '**/dashboard'"

# JSON mode (piped from script)
echo '[
  ["open", "https://example.com"],
  ["snapshot"],
  ["click", "@e1"],
  ["screenshot"]
]' | agent-browser batch --json

AI Chat Mode

Natural language browser control:

bash

# Single-shot command
agent-browser chat "go to github.com and search for rust browser automation"

# Interactive REPL
agent-browser chat
# > navigate to example.com
# > click the first link
# > take a screenshot
# > exit

Network Interception

bash

# Block specific resources
agent-browser network route "**/*.jpg" --abort
agent-browser network route "https://ads.example.com/*" --abort

# Block resource types
agent-browser network route '*' --abort --resource-type script
agent-browser network route '*' --abort --resource-type image,media

# Mock API responses
agent-browser network route "https://api.example.com/user" --body '{
  "status": "ok",
  "data": {"id": 1, "name": "Test User"}
}'

# View network requests
agent-browser network requests
agent-browser network requests --filter api
agent-browser network requests --type fetch,xhr
agent-browser network requests --method POST
agent-browser network requests --status 200

# View specific request detail
agent-browser network request req_123abc

# HAR recording
agent-browser network har start
# ... perform actions ...
agent-browser network har stop capture.har

Screenshot Strategies

bash

# Basic screenshot (saves to temp if no path)
agent-browser screenshot

# Save to specific path
agent-browser screenshot ./screenshots/page.png

# Full page screenshot
agent-browser screenshot --full full-page.png

# Annotated with element labels
agent-browser screenshot --annotate labeled.png

# Custom directory and format
agent-browser screenshot --screenshot-dir ./shots --screenshot-format jpeg --screenshot-quality 80

# Element screenshot
agent-browser eval "document.querySelector('#main').screenshot()" -b > element.png

Multi-Tab Operations

bash

# List tabs
agent-browser tab

# Open new tab with label
agent-browser tab new --label docs https://docs.example.com

# Open link in new tab
agent-browser click "#external-link" --new-tab

# Switch to tab by label
agent-browser tab docs

# Switch to tab by ID
agent-browser tab t2

# Close tab
agent-browser tab close docs
agent-browser tab close t2

Wait Strategies

bash

# Wait for element to appear
agent-browser wait "#result"

# Wait for time (milliseconds)
agent-browser wait 2000

# Wait for text (substring match)
agent-browser wait --text "Success"

# Wait for URL pattern
agent-browser wait --url "**/dashboard"

# Wait for network idle
agent-browser wait --load networkidle

# Wait for JavaScript condition
agent-browser wait --fn "document.readyState === 'complete'"
agent-browser wait --fn "window.dataLoaded === true"

# Wait for element to disappear
agent-browser wait --fn "!document.body.innerText.includes('Loading...')"
agent-browser wait "#spinner" --state hidden

Keyboard and Mouse Control

bash

# Keyboard shortcuts
agent-browser press "Control+a"
agent-browser press "Control+c"
agent-browser press "Enter"
agent-browser press "Tab"

# Type with real keystrokes (at current focus)
agent-browser keyboard type "Hello World"

# Insert text without key events
agent-browser keyboard inserttext "Pasted content"

# Hold and release keys
agent-browser keydown "Shift"
agent-browser press "a"
agent-browser keyup "Shift"

# Mouse control
agent-browser mouse move 100 200
agent-browser mouse down left
agent-browser mouse up left
agent-browser mouse wheel 100 0  # dy dx

# Drag and drop
agent-browser drag "#source" "#target"

Cookies and Storage

bash

# Cookies
agent-browser cookies
agent-browser cookies set "sessionId" "abc123"
agent-browser cookies clear

# Import from cURL
agent-browser cookies set --curl curl-cookies.txt

# LocalStorage
agent-browser storage local
agent-browser storage local get "token"
agent-browser storage local set "token" "eyJ..."
agent-browser storage local clear

# SessionStorage
agent-browser storage session
agent-browser storage session set "tempData" '{"key":"value"}'

Browser Configuration

bash

# Set viewport
agent-browser set viewport 1920 1080
agent-browser set viewport 1920 1080 2  # Retina (2x scale)

# Emulate device
agent-browser set device "iPhone 14"

# Geolocation
agent-browser set geo 37.7749 -122.4194  # San Francisco

# Offline mode
agent-browser set offline on
agent-browser set offline off

# Custom headers
agent-browser set headers '{"X-Custom-Header":"value"}'

# HTTP auth
agent-browser set credentials username password

# Color scheme
agent-browser set media dark
agent-browser set media light

Advanced JavaScript Evaluation

bash

# Simple evaluation
agent-browser eval "document.title"

# Complex extraction
agent-browser eval "
  JSON.stringify({
    url: window.location.href,
    links: Array.from(document.querySelectorAll('a')).map(a => ({
      text: a.textContent.trim(),
      href: a.href
    })).filter(l => l.text)
  })
"

# Binary output (base64)
agent-browser eval "document.querySelector('canvas').toDataURL()" -b > canvas.png

# From stdin
echo "document.body.innerHTML" | agent-browser eval --stdin

Automation Script Example

Create a reusable automation script:

bash

#!/bin/bash
# login-and-extract.sh

set -e

SESSION_FILE=".browser-session"

# Start browser
agent-browser open https://app.example.com/login

# Login
agent-browser find label "Email" fill "${USER_EMAIL}"
agent-browser find label "Password" fill "${USER_PASSWORD}"
agent-browser find role button click --name "Sign in"

# Wait for dashboard
agent-browser wait --url "**/dashboard"
agent-browser wait 1000

# Extract data
DATA=$(agent-browser eval "
  JSON.stringify({
    user: document.querySelector('.user-name').textContent,
    notifications: Array.from(document.querySelectorAll('.notification')).map(n => n.textContent)
  })
")

echo "$DATA" | jq .

# Screenshot
agent-browser screenshot dashboard.png

# Cleanup
agent-browser close

Usage:

bash

export USER_EMAIL="user@example.com"
export USER_PASSWORD="secret"
chmod +x login-and-extract.sh
./login-and-extract.sh

Batch Automation Pattern

For complex multi-step workflows, use batch mode with a script:

javascript

#!/usr/bin/env node
// automation.js

const commands = [
  ["open", "https://example.com"],
  ["snapshot"],
  // Process snapshot, determine next steps...
  ["click", "@e5"],
  ["wait", "1000"],
  ["screenshot", "step1.png"],
  ["find", "role", "button", "click", "--name", "Next"],
  ["wait", "--url", "**/step2"],
  ["screenshot", "step2.png"]
];

console.log(JSON.stringify(commands));

Run with:

bash

node automation.js | agent-browser batch --json --bail

CDP Streaming for Real-Time Monitoring

Connect to Chrome DevTools Protocol for event streaming:

bash

# Get CDP WebSocket URL
CDP_URL=$(agent-browser get cdp-url)
echo "CDP URL: $CDP_URL"

# Or connect directly to CDP port
agent-browser connect 9222

# Enable runtime streaming
agent-browser stream enable --port 9223
agent-browser stream status
# ... perform actions, stream events to ws://localhost:9223 ...
agent-browser stream disable

Troubleshooting

Browser Not Found

bash

# Re-download Chrome for Testing
agent-browser install

# Or specify custom Chrome path
export CHROME_PATH=/path/to/chrome
agent-browser open

Elements Not Found

bash

# Use snapshot to see available refs
agent-browser snapshot -i

# Wait for element before interacting
agent-browser wait "#dynamic-element"
agent-browser click "#dynamic-element"

# Use semantic selectors for robustness
agent-browser find role button click --name "Submit"

Session Management

bash

# List all active sessions
agent-browser tab

# Close specific session
agent-browser close

# Close all sessions
agent-browser close --all

Debugging

bash

# Get CDP URL for DevTools connection
agent-browser get cdp-url

# View network activity
agent-browser network requests

# Enable HAR recording
agent-browser network har start
# ... actions ...
agent-browser network har stop debug.har

Clipboard Access Issues

On Linux, clipboard requires

xclip

xsel

bash

sudo apt-get install xclip
# or
sudo apt-get install xsel

Environment Variables

bash

# Custom Chrome binary
export CHROME_PATH=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser

# Custom user data directory
export CHROME_USER_DATA_DIR=~/.config/agent-browser

# Headless mode (default)
export CHROME_HEADLESS=true

# Show browser UI
export CHROME_HEADLESS=false

Best Practices for AI Agents

Use snapshots first: Always get
```
agent-browser snapshot
```
to understand page structure before acting
Prefer refs over selectors: Use
```
@e1
```
style refs from snapshots for stability
Use semantic locators:
```
find role button --name "Submit"
```
is more resilient than CSS selectors
Wait strategically: Add
```
wait
```
commands for dynamic content before interacting
Batch when possible: Use
```
batch
```
mode to reduce per-command overhead
Handle errors: Use
```
--bail
```
in batch mode to stop on first error
Clean up: Always
```
close
```
sessions when done
Screenshot for verification: Take screenshots at key steps for debugging
Use labels for tabs: Assign meaningful labels when opening multiple tabs

Quick Reference

bash

# Essential commands for AI agents
agent-browser open <url>              # Start
agent-browser snapshot                # Get structure with refs
agent-browser click @eN               # Interact with ref
agent-browser find role button click --name "Text"  # Semantic
agent-browser wait --text "Success"   # Wait for content
agent-browser screenshot result.png   # Verify
agent-browser get text @eN            # Extract
agent-browser eval "JS"               # Complex extraction
agent-browser batch "cmd1" "cmd2"     # Multi-step
agent-browser close                   # Cleanup

vercel-agent-browser

NPX Install

Tags

SKILL.md Content

Vercel Agent Browser Skill

Overview

Installation

Global (Recommended)

Project Local

Alternative Methods

Upgrading

Core Concepts

Element References (@refs)

Traditional Selectors

Semantic Locators

Common Workflows

Basic Navigation and Interaction

Form Automation

Data Extraction

Batch Execution

AI Chat Mode

Network Interception

Screenshot Strategies

Multi-Tab Operations

Wait Strategies

Keyboard and Mouse Control

Cookies and Storage

Browser Configuration

Advanced JavaScript Evaluation

Automation Script Example

Batch Automation Pattern

CDP Streaming for Real-Time Monitoring

Troubleshooting

Browser Not Found

Elements Not Found

Session Management

Debugging

Clipboard Access Issues

Environment Variables

Best Practices for AI Agents

Quick Reference