agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser Automation with agent-browser

使用agent-browser实现浏览器自动化

Quick start

快速开始

bash
agent-browser open <url>        # Navigate to page
agent-browser snapshot -i       # Get interactive elements with refs
agent-browser click @e1         # Click element by ref
agent-browser fill @e2 "text"   # Fill input by ref
agent-browser close             # Close browser
bash
agent-browser open <url>        # 导航至页面
agent-browser snapshot -i       # 获取带引用标识的交互元素
agent-browser click @e1         # 通过引用标识点击元素
agent-browser fill @e2 "text"   # 通过引用标识填充输入框
agent-browser close             # 关闭浏览器

Core workflow

核心工作流程

  1. Navigate:
    agent-browser open <url>
  2. Snapshot:
    agent-browser snapshot -i
    (returns elements with refs like
    @e1
    ,
    @e2
    )
  3. Interact using refs from the snapshot
  4. Re-snapshot after navigation or significant DOM changes
  1. 导航:
    agent-browser open <url>
  2. 快照:
    agent-browser snapshot -i
    (返回带@e1、@e2等引用标识的元素)
  3. 使用快照中的引用标识进行交互操作
  4. 导航后或DOM发生重大变化时,重新生成快照

Commands

命令

Navigation

导航

bash
agent-browser open <url>      # Navigate to URL
agent-browser back            # Go back
agent-browser forward         # Go forward
agent-browser reload          # Reload page
agent-browser close           # Close browser
bash
agent-browser open <url>      # 导航至指定URL
agent-browser back            # 返回上一页
agent-browser forward         # 前进到下一页
agent-browser reload          # 重新加载页面
agent-browser close           # 关闭浏览器

Snapshot (page analysis)

快照(页面分析)

bash
agent-browser snapshot        # Full accessibility tree
agent-browser snapshot -i     # Interactive elements only (recommended)
agent-browser snapshot -c     # Compact output
agent-browser snapshot -d 3   # Limit depth to 3
bash
agent-browser snapshot        # 获取完整可访问性树
agent-browser snapshot -i     # 仅获取交互元素(推荐使用)
agent-browser snapshot -c     # 输出精简格式结果
agent-browser snapshot -d 3   # 限制输出深度为3

Interactions (use @refs from snapshot)

交互操作(使用快照中的@引用标识)

bash
agent-browser click @e1           # Click
agent-browser dblclick @e1        # Double-click
agent-browser fill @e2 "text"     # Clear and type
agent-browser type @e2 "text"     # Type without clearing
agent-browser press Enter         # Press key
agent-browser press Control+a     # Key combination
agent-browser hover @e1           # Hover
agent-browser check @e1           # Check checkbox
agent-browser uncheck @e1         # Uncheck checkbox
agent-browser select @e1 "value"  # Select dropdown
agent-browser scroll down 500     # Scroll page
agent-browser scrollintoview @e1  # Scroll element into view
bash
agent-browser click @e1           # 点击元素
agent-browser dblclick @e1        # 双击元素
agent-browser fill @e2 "text"     # 清空并输入文本
agent-browser type @e2 "text"     # 直接输入文本(不清空原有内容)
agent-browser press Enter         # 按下回车键
agent-browser press Control+a     # 按下组合键
agent-browser hover @e1           # 悬停在元素上
agent-browser check @e1           # 勾选复选框
agent-browser uncheck @e1         # 取消勾选复选框
agent-browser select @e1 "value"  # 选择下拉选项
agent-browser scroll down 500     # 向下滚动500像素
agent-browser scrollintoview @e1  # 滚动至元素可见位置

Get information

获取信息

bash
agent-browser get text @e1        # Get element text
agent-browser get value @e1       # Get input value
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
bash
agent-browser get text @e1        # 获取元素文本内容
agent-browser get value @e1       # 获取输入框值
agent-browser get title           # 获取页面标题
agent-browser get url             # 获取当前URL

Screenshots

截图

bash
agent-browser screenshot          # Screenshot to stdout
agent-browser screenshot path.png # Save to file
agent-browser screenshot --full   # Full page
bash
agent-browser screenshot          # 将截图输出至标准输出
agent-browser screenshot path.png # 将截图保存至指定文件
agent-browser screenshot --full   # 截取完整页面

Wait

等待

bash
agent-browser wait @e1                     # Wait for element
agent-browser wait 2000                    # Wait milliseconds
agent-browser wait --text "Success"        # Wait for text
agent-browser wait --load networkidle      # Wait for network idle
bash
agent-browser wait @e1                     # 等待元素出现
agent-browser wait 2000                    # 等待指定毫秒数
agent-browser wait --text "Success"        # 等待指定文本出现
agent-browser wait --load networkidle      # 等待网络空闲

Semantic locators (alternative to refs)

语义定位器(替代引用标识的方式)

bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"

Example: Form submission

示例:表单提交

bash
agent-browser open https://example.com/form
agent-browser snapshot -i
bash
agent-browser open https://example.com/form
agent-browser snapshot -i

Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]

输出内容示例:文本框 "Email" [引用标识=e1], 文本框 "Password" [引用标识=e2], 按钮 "Submit" [引用标识=e3]

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result
undefined
agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # 检查提交结果
undefined