agent-browser-2

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Browser Skill

Agent 浏览器技能

Fast browser automation using accessibility tree snapshots with refs for deterministic element selection.
基于可访问性树快照和引用实现的快速浏览器自动化,可实现确定性元素选择。

Why Use This Over Built-in Browser Tool

与内置浏览器工具相比,为什么选择它

Use agent-browser when:
  • Automating multi-step workflows
  • Need deterministic element selection
  • Performance is critical
  • Working with complex SPAs
  • Need session isolation
Use built-in browser tool when:
  • Need screenshots/PDFs for analysis
  • Visual inspection required
  • Browser extension integration needed
适合使用agent-browser的场景:
  • 自动化多步工作流
  • 需要确定性的元素选择
  • 性能要求极高
  • 处理复杂SPA
  • 需要会话隔离
适合使用内置浏览器工具的场景:
  • 需要截图/PDF用于分析
  • 需要可视化检查
  • 需要浏览器扩展集成

Core Workflow

核心工作流

bash
undefined
bash
undefined

1. Navigate and snapshot

1. 导航并生成快照

agent-browser open https://example.com agent-browser snapshot -i --json
agent-browser open https://example.com agent-browser snapshot -i --json

2. Parse refs from JSON, then interact

2. 从JSON中解析引用,然后执行交互

agent-browser click @e2 agent-browser fill @e3 "text"
agent-browser click @e2 agent-browser fill @e3 "text"

3. Re-snapshot after page changes

3. 页面变更后重新生成快照

agent-browser snapshot -i --json
undefined
agent-browser snapshot -i --json
undefined

Key Commands

核心命令

Navigation

导航

bash
agent-browser open <url>
agent-browser back | forward | reload | close
bash
agent-browser open <url>
agent-browser back | forward | reload | close

Snapshot (Always use -i --json)

快照(始终使用 -i --json 参数)

bash
agent-browser snapshot -i --json          # Interactive elements, JSON output
agent-browser snapshot -i -c -d 5 --json  # + compact, depth limit
agent-browser snapshot -s "#main" -i      # Scope to selector
bash
agent-browser snapshot -i --json          # 可交互元素,JSON格式输出
agent-browser snapshot -i -c -d 5 --json  # + 紧凑格式、深度限制
agent-browser snapshot -s "#main" -i      # 限定选择器范围

Interactions (Ref-based)

交互(基于引用)

bash
agent-browser click @e2
agent-browser fill @e3 "text"
agent-browser type @e3 "text"
agent-browser hover @e4
agent-browser check @e5 | uncheck @e5
agent-browser select @e6 "value"
agent-browser press "Enter"
agent-browser scroll down 500
agent-browser drag @e7 @e8
bash
agent-browser click @e2
agent-browser fill @e3 "text"
agent-browser type @e3 "text"
agent-browser hover @e4
agent-browser check @e5 | uncheck @e5
agent-browser select @e6 "value"
agent-browser press "Enter"
agent-browser scroll down 500
agent-browser drag @e7 @e8

Get Information

获取信息

bash
agent-browser get text @e1 --json
agent-browser get html @e2 --json
agent-browser get value @e3 --json
agent-browser get attr @e4 "href" --json
agent-browser get title --json
agent-browser get url --json
agent-browser get count ".item" --json
bash
agent-browser get text @e1 --json
agent-browser get html @e2 --json
agent-browser get value @e3 --json
agent-browser get attr @e4 "href" --json
agent-browser get title --json
agent-browser get url --json
agent-browser get count ".item" --json

Check State

检查状态

bash
agent-browser is visible @e2 --json
agent-browser is enabled @e3 --json
agent-browser is checked @e4 --json
bash
agent-browser is visible @e2 --json
agent-browser is enabled @e3 --json
agent-browser is checked @e4 --json

Wait

等待

bash
agent-browser wait @e2                    # Wait for element
agent-browser wait 1000                   # Wait ms
agent-browser wait --text "Welcome"       # Wait for text
agent-browser wait --url "**/dashboard"   # Wait for URL
agent-browser wait --load networkidle     # Wait for network
agent-browser wait --fn "window.ready === true"
bash
agent-browser wait @e2                    # 等待元素加载
agent-browser wait 1000                   # 等待指定毫秒数
agent-browser wait --text "Welcome"       # 等待指定文本出现
agent-browser wait --url "**/dashboard"   # 等待指定URL加载
agent-browser wait --load networkidle     # 等待网络空闲
agent-browser wait --fn "window.ready === true"

Sessions (Isolated Browsers)

会话(隔离浏览器)

bash
agent-browser --session admin open site.com
agent-browser --session user open site.com
agent-browser session list
bash
agent-browser --session admin open site.com
agent-browser --session user open site.com
agent-browser session list

Or via env: AGENT_BROWSER_SESSION=admin agent-browser ...

也可通过环境变量指定:AGENT_BROWSER_SESSION=admin agent-browser ...

undefined
undefined

State Persistence

状态持久化

bash
agent-browser state save auth.json        # Save cookies/storage
agent-browser state load auth.json        # Load (skip login)
bash
agent-browser state save auth.json        # 保存Cookie/存储数据
agent-browser state load auth.json        # 加载状态(跳过登录)

Screenshots & PDFs

截图与PDF

bash
agent-browser screenshot page.png
agent-browser screenshot --full page.png
agent-browser pdf page.pdf
bash
agent-browser screenshot page.png
agent-browser screenshot --full page.png
agent-browser pdf page.pdf

Network Control

网络控制

bash
agent-browser network route "**/ads/*" --abort           # Block
agent-browser network route "**/api/*" --body '{"x":1}'  # Mock
agent-browser network requests --filter api              # View
bash
agent-browser network route "**/ads/*" --abort           # 拦截请求
agent-browser network route "**/api/*" --body '{"x":1}'  # Mock返回
agent-browser network requests --filter api              # 查看请求

Cookies & Storage

Cookie与存储

bash
agent-browser cookies                     # Get all
agent-browser cookies set name value
agent-browser storage local key           # Get localStorage
agent-browser storage local set key val
bash
agent-browser cookies                     # 获取所有Cookie
agent-browser cookies set name value
agent-browser storage local key           # 获取localStorage
agent-browser storage local set key val

Tabs & Frames

标签页与框架

bash
agent-browser tab new https://example.com
agent-browser tab 2                       # Switch to tab
agent-browser frame @e5                   # Switch to iframe
agent-browser frame main                  # Back to main
bash
agent-browser tab new https://example.com
agent-browser tab 2                       # 切换到指定标签页
agent-browser frame @e5                   # 切换到指定iframe
agent-browser frame main                  # 返回主框架

Snapshot Output Format

快照输出格式

json
{
  "success": true,
  "data": {
    "snapshot": "...",
    "refs": {
      "e1": {"role": "heading", "name": "Example Domain"},
      "e2": {"role": "button", "name": "Submit"},
      "e3": {"role": "textbox", "name": "Email"}
    }
  }
}
json
{
  "success": true,
  "data": {
    "snapshot": "...",
    "refs": {
      "e1": {"role": "heading", "name": "Example Domain"},
      "e2": {"role": "button", "name": "Submit"},
      "e3": {"role": "textbox", "name": "Email"}
    }
  }
}

Best Practices

最佳实践

  1. Always use
    -i
    flag
    - Focus on interactive elements
  2. Always use
    --json
    - Easier to parse
  3. Wait for stability -
    agent-browser wait --load networkidle
  4. Save auth state - Skip login flows with
    state save/load
  5. Use sessions - Isolate different browser contexts
  6. Use
    --headed
    for debugging
    - See what's happening
  1. 始终使用
    -i
    标志
    - 聚焦可交互元素
  2. 始终使用
    --json
    - 更易于解析
  3. 等待页面稳定 - 执行
    agent-browser wait --load networkidle
  4. 保存认证状态 - 使用
    state save/load
    跳过登录流程
  5. 使用会话 - 隔离不同的浏览器上下文
  6. 使用
    --headed
    调试
    - 直观查看运行过程

Example: Search and Extract

示例:搜索与提取

bash
agent-browser open https://www.google.com
agent-browser snapshot -i --json
bash
agent-browser open https://www.google.com
agent-browser snapshot -i --json

AI identifies search box @e1

AI识别到搜索框对应引用@e1

agent-browser fill @e1 "AI agents" agent-browser press Enter agent-browser wait --load networkidle agent-browser snapshot -i --json
agent-browser fill @e1 "AI agents" agent-browser press Enter agent-browser wait --load networkidle agent-browser snapshot -i --json

AI identifies result refs

AI识别到结果对应的引用

agent-browser get text @e3 --json agent-browser get attr @e4 "href" --json
undefined
agent-browser get text @e3 --json agent-browser get attr @e4 "href" --json
undefined

Example: Multi-Session Testing

示例:多会话测试

bash
undefined
bash
undefined

Admin session

管理员会话

agent-browser --session admin open app.com agent-browser --session admin state load admin-auth.json agent-browser --session admin snapshot -i --json
agent-browser --session admin open app.com agent-browser --session admin state load admin-auth.json agent-browser --session admin snapshot -i --json

User session (simultaneous)

用户会话(同时运行)

agent-browser --session user open app.com agent-browser --session user state load user-auth.json agent-browser --session user snapshot -i --json
undefined
agent-browser --session user open app.com agent-browser --session user state load user-auth.json agent-browser --session user snapshot -i --json
undefined

Installation

安装

bash
npm install -g agent-browser
agent-browser install                     # Download Chromium
agent-browser install --with-deps         # Linux: + system deps
bash
npm install -g agent-browser
agent-browser install                     # 下载Chromium
agent-browser install --with-deps         # Linux系统:同时安装系统依赖

Credits

致谢

Skill created by Yossi Elkrief (@MaTriXy)
agent-browser CLI by Vercel Labs
本技能由Yossi Elkrief(@MaTriXy)创建 agent-browser CLI由Vercel Labs开发