browser-navigation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser Automation with agent-browser

基于agent-browser的浏览器自动化

Comprehensive browser automation for testing, data extraction, and web interaction.
实现全面的浏览器自动化,适用于测试、数据提取和网页交互场景。

Quick Start

快速开始

bash
agent-browser open <url>        # Navigate to page
agent-browser snapshot -i       # Get interactive elements with refs
agent-browser click @e1         # Click element by ref
agent-browser fill @e2 "text"   # Fill input by ref
agent-browser close             # Close browser
bash
agent-browser open <url>        # 导航至指定页面
agent-browser snapshot -i       # 获取带引用标识的交互式元素
agent-browser click @e1         # 通过引用标识点击元素
agent-browser fill @e2 "text"   # 通过引用标识填充输入框
agent-browser close             # 关闭浏览器

Core Workflow

核心工作流

  1. Navigate:
    agent-browser open <url>
  2. Snapshot:
    agent-browser snapshot -i
    (returns elements with refs like
    @e1
    ,
    @e2
    )
  3. Interact using refs from the snapshot
  4. Re-snapshot after navigation or significant DOM changes
  1. 导航
    agent-browser open <url>
  2. 快照
    agent-browser snapshot -i
    (返回带
    @e1
    @e2
    这类引用标识的元素)
  3. 交互:使用快照返回的引用标识进行操作
  4. 重新快照:在页面导航或DOM发生重大变化后重新获取快照

Installation

安装

bash
undefined
bash
undefined

Install globally

全局安装

npm install -g agent-browser
npm install -g agent-browser

Or use via npx

或通过npx直接使用

npx agent-browser open https://example.com
undefined
npx agent-browser open https://example.com
undefined

Commands Reference

命令参考

Navigation

导航

bash
agent-browser open <url>      # Navigate to URL
agent-browser back            # Go back
agent-browser forward         # Go forward
agent-browser reload          # Reload page
agent-browser close           # Close browser
bash
agent-browser open <url>      # 导航至指定URL
agent-browser back            # 返回上一页
agent-browser forward         # 前进到下一页
agent-browser reload          # 重新加载页面
agent-browser close           # 关闭浏览器

Page Analysis (Snapshot)

页面分析(快照)

bash
agent-browser snapshot            # Full accessibility tree
agent-browser snapshot -i         # Interactive elements only (RECOMMENDED)
agent-browser snapshot -c         # Compact output
agent-browser snapshot -d 3       # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selector
bash
agent-browser snapshot            # 获取完整可访问性树
agent-browser snapshot -i         # 仅获取交互式元素(推荐使用)
agent-browser snapshot -c         # 精简输出格式
agent-browser snapshot -d 3       # 限制输出深度为3
agent-browser snapshot -s "#main" # 仅返回匹配CSS选择器的内容

Interactions (Use @refs from Snapshot)

交互操作(使用快照中的@引用标识)

bash
undefined
bash
undefined

Clicking

点击操作

agent-browser click @e1 # Click agent-browser dblclick @e1 # Double-click agent-browser hover @e1 # Hover
agent-browser click @e1 # 单击 agent-browser dblclick @e1 # 双击 agent-browser hover @e1 # 悬停

Focus

焦点操作

agent-browser focus @e1 # Focus element
agent-browser focus @e1 # 聚焦元素

Text Input

文本输入

agent-browser fill @e2 "text" # Clear and type agent-browser type @e2 "text" # Type without clearing
agent-browser fill @e2 "text" # 清空后输入文本 agent-browser type @e2 "text" # 直接输入文本(不清空原有内容)

Keyboard

键盘操作

agent-browser press Enter # Press key agent-browser press Control+a # Key combination agent-browser keydown Shift # Hold key down agent-browser keyup Shift # Release key
agent-browser press Enter # 按下指定按键 agent-browser press Control+a # 按下组合键 agent-browser keydown Shift # 按住按键 agent-browser keyup Shift # 松开按键

Forms

表单操作

agent-browser check @e1 # Check checkbox agent-browser uncheck @e1 # Uncheck checkbox agent-browser select @e1 "value" # Select dropdown option
agent-browser check @e1 # 勾选复选框 agent-browser uncheck @e1 # 取消勾选复选框 agent-browser select @e1 "value" # 选择下拉选项

Scrolling

滚动操作

agent-browser scroll down 500 # Scroll page agent-browser scrollintoview @e1 # Scroll element into view
agent-browser scroll down 500 # 向下滚动页面500像素 agent-browser scrollintoview @e1 # 滚动至元素可见

Other

其他操作

agent-browser drag @e1 @e2 # Drag and drop agent-browser upload @e1 file.pdf # Upload files
undefined
agent-browser drag @e1 @e2 # 拖拽元素 agent-browser upload @e1 file.pdf # 上传文件
undefined

Getting Information

信息获取

bash
agent-browser get text @e1        # Get element text
agent-browser get html @e1        # Get innerHTML
agent-browser get value @e1       # Get input value
agent-browser get attr @e1 href   # Get attribute
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
agent-browser get count ".item"   # Count matching elements
agent-browser get box @e1         # Get bounding box
bash
agent-browser get text @e1        # 获取元素文本内容
agent-browser get html @e1        # 获取元素innerHTML
agent-browser get value @e1       # 获取输入框值
agent-browser get attr @e1 href   # 获取元素属性
agent-browser get title           # 获取页面标题
agent-browser get url             # 获取当前页面URL
agent-browser get count ".item"   # 统计匹配元素数量
agent-browser get box @e1         # 获取元素边界框信息

Checking State

状态检查

bash
agent-browser is visible @e1      # Check if visible
agent-browser is enabled @e1      # Check if enabled
agent-browser is checked @e1      # Check if checked
bash
agent-browser is visible @e1      # 检查元素是否可见
agent-browser is enabled @e1      # 检查元素是否可用
agent-browser is checked @e1      # 检查复选框是否已勾选

Screenshots & PDF

截图与PDF导出

bash
agent-browser screenshot          # Screenshot to stdout
agent-browser screenshot path.png # Save to file
agent-browser screenshot --full   # Full page
agent-browser pdf output.pdf      # Save as PDF
bash
agent-browser screenshot          # 将截图输出至标准输出
agent-browser screenshot path.png # 将截图保存至指定文件
agent-browser screenshot --full   # 截取整页
agent-browser pdf output.pdf      # 将页面保存为PDF

Video Recording

视频录制

bash
agent-browser record start ./demo.webm    # Start recording
agent-browser click @e1                   # Perform actions
agent-browser record stop                 # Stop and save video
agent-browser record restart ./take2.webm # Stop current + start new
bash
agent-browser record start ./demo.webm    # 开始录制视频
agent-browser click @e1                   # 执行操作
agent-browser record stop                 # 停止录制并保存视频
agent-browser record restart ./take2.webm # 停止当前录制并开始新的录制

Waiting

等待操作

bash
agent-browser wait @e1                     # Wait for element
agent-browser wait 2000                    # Wait milliseconds
agent-browser wait --text "Success"        # Wait for text
agent-browser wait --url "**/dashboard"    # Wait for URL pattern
agent-browser wait --load networkidle      # Wait for network idle
agent-browser wait --fn "window.ready"     # Wait for JS condition
bash
agent-browser wait @e1                     # 等待元素出现
agent-browser wait 2000                    # 等待指定毫秒数
agent-browser wait --text "Success"        # 等待指定文本出现
agent-browser wait --url "**/dashboard"    # 等待URL匹配指定模式
agent-browser wait --load networkidle      # 等待网络空闲
agent-browser wait --fn "window.ready"     # 等待JavaScript条件满足

Cookies & Storage

Cookie与存储

bash
agent-browser cookies                     # Get all cookies
agent-browser cookies set name value      # Set cookie
agent-browser cookies clear               # Clear cookies
agent-browser storage local               # Get all localStorage
agent-browser storage local key           # Get specific key
agent-browser storage local set k v       # Set value
agent-browser storage local clear         # Clear all
bash
agent-browser cookies                     # 获取所有Cookie
agent-browser cookies set name value      # 设置Cookie
agent-browser cookies clear               # 清空所有Cookie
agent-browser storage local               # 获取所有localStorage内容
agent-browser storage local key           # 获取localStorage中指定键的值
agent-browser storage local set k v       # 设置localStorage中的键值对
agent-browser storage local clear         # 清空所有localStorage内容

Network

网络操作

bash
agent-browser network route <url>              # Intercept requests
agent-browser network route <url> --abort      # Block requests
agent-browser network route <url> --body '{}'  # Mock response
agent-browser network unroute [url]            # Remove routes
agent-browser network requests                 # View tracked requests
agent-browser network requests --filter api    # Filter requests
bash
agent-browser network route <url>              # 拦截指定请求
agent-browser network route <url> --abort      # 阻止指定请求
agent-browser network route <url> --body '{}'  # 模拟请求响应
agent-browser network unroute [url]            # 移除请求路由规则
agent-browser network requests                 # 查看已跟踪的请求
agent-browser network requests --filter api    # 过滤请求

Browser Settings

浏览器设置

bash
agent-browser set viewport 1920 1080      # Set viewport size
agent-browser set device "iPhone 14"      # Emulate device
agent-browser set geo 37.7749 -122.4194   # Set geolocation
agent-browser set offline on              # Toggle offline mode
agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
agent-browser set credentials user pass   # HTTP basic auth
agent-browser set media dark              # Emulate color scheme
bash
agent-browser set viewport 1920 1080      # 设置视口尺寸
agent-browser set device "iPhone 14"      # 模拟指定设备
agent-browser set geo 37.7749 -122.4194   # 设置地理位置
agent-browser set offline on              # 切换离线模式
agent-browser set headers '{"X-Key":"v"}' # 设置额外HTTP请求头
agent-browser set credentials user pass   # 设置HTTP基础认证信息
agent-browser set media dark              # 模拟深色配色方案

Tabs & Windows

标签页与窗口

bash
agent-browser tab                 # List tabs
agent-browser tab new [url]       # New tab
agent-browser tab 2               # Switch to tab
agent-browser tab close           # Close tab
agent-browser window new          # New window
bash
agent-browser tab                 # 列出所有标签页
agent-browser tab new [url]       # 新建标签页
agent-browser tab 2               # 切换至指定标签页
agent-browser tab close           # 关闭当前标签页
agent-browser window new          # 新建浏览器窗口

Frames

框架操作

bash
agent-browser frame "#iframe"     # Switch to iframe
agent-browser frame main          # Back to main frame
bash
agent-browser frame "#iframe"     # 切换至指定iframe
agent-browser frame main          # 返回主框架

Dialogs

对话框操作

bash
agent-browser dialog accept [text]  # Accept dialog
agent-browser dialog dismiss        # Dismiss dialog
bash
agent-browser dialog accept [text]  # 确认对话框(可传入文本)
agent-browser dialog dismiss        # 取消对话框

JavaScript Execution

JavaScript执行

bash
agent-browser eval "document.title"   # Run JavaScript
bash
agent-browser eval "document.title"   # 执行指定JavaScript代码

Example Workflows

示例工作流

Form Submission

表单提交

bash
agent-browser open https://example.com/form
agent-browser snapshot -i
bash
agent-browser open https://example.com/form
agent-browser snapshot -i

Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]

输出内容包含:文本框"Email" [引用标识=e1]、文本框"Password" [引用标识=e2]、按钮"Submit" [引用标识=e3]

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result
undefined
agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # 检查操作结果
undefined

Authentication with Saved State

基于已保存状态的认证

bash
undefined
bash
undefined

Login once

首次登录

agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "username" agent-browser fill @e2 "password" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json
agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "username" agent-browser fill @e2 "password" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json

Later sessions: load saved state

后续会话:加载已保存的状态

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard
undefined
agent-browser state load auth.json agent-browser open https://app.example.com/dashboard
undefined

Scraping Data

数据爬取

bash
agent-browser open https://example.com/products
agent-browser snapshot -i --json > page_structure.json
bash
agent-browser open https://example.com/products
agent-browser snapshot -i --json > page_structure.json

Get specific data

获取指定数据

agent-browser get text @e5 # Product title agent-browser get attr @e6 href # Product link
undefined
agent-browser get text @e5 # 获取产品标题 agent-browser get attr @e6 href # 获取产品链接
undefined

Taking Full Page Screenshot

截取整页截图

bash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full fullpage.png
bash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full fullpage.png

Testing Login Flow

登录流程测试

bash
undefined
bash
undefined

Navigate to login

导航至登录页面

Take initial snapshot

获取初始快照

agent-browser snapshot -i
agent-browser snapshot -i

Fill credentials

填写认证信息

agent-browser fill @e1 "test@example.com" agent-browser fill @e2 "testpassword"
agent-browser fill @e1 "test@example.com" agent-browser fill @e2 "testpassword"

Click login

点击登录按钮

agent-browser click @e3
agent-browser click @e3

Wait for redirect

等待页面跳转

agent-browser wait --url "**/dashboard"
agent-browser wait --url "**/dashboard"

Verify logged in

验证是否登录成功

agent-browser get text @e10 # Should show username
undefined
agent-browser get text @e10 # 应显示用户名
undefined

Debugging

调试

bash
agent-browser open example.com --headed              # Show browser window
agent-browser console                                # View console messages
agent-browser console --clear                        # Clear console
agent-browser errors                                 # View page errors
agent-browser errors --clear                         # Clear errors
agent-browser highlight @e1                          # Highlight element
agent-browser trace start                            # Start recording trace
agent-browser trace stop trace.zip                   # Stop and save trace
agent-browser --cdp 9222 snapshot                    # Connect via CDP
bash
agent-browser open example.com --headed              # 显示浏览器窗口
agent-browser console                                # 查看控制台消息
agent-browser console --clear                        # 清空控制台
agent-browser errors                                 # 查看页面错误
agent-browser errors --clear                         # 清空错误记录
agent-browser highlight @e1                          # 高亮显示指定元素
agent-browser trace start                            # 开始录制跟踪信息
agent-browser trace stop trace.zip                   # 停止录制并保存跟踪文件
agent-browser --cdp 9222 snapshot                    # 通过CDP连接浏览器

Sessions (Parallel Browsers)

会话管理(多并行浏览器)

bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list

JSON Output

JSON格式输出

Add
--json
for machine-readable output:
bash
agent-browser snapshot -i --json
agent-browser get text @e1 --json
添加
--json
参数可获取机器可读的输出:
bash
agent-browser snapshot -i --json
agent-browser get text @e1 --json

Semantic Locators (Alternative to Refs)

语义定位(替代引用标识)

bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text

Best Practices

最佳实践

  1. Always snapshot first: Get the current page state before interacting
  2. Use interactive mode (-i): Shows only clickable/fillable elements
  3. Wait appropriately: Use
    --load networkidle
    after navigation
  4. Re-snapshot after changes: DOM updates invalidate refs
  5. Save authentication state: Avoid repeated logins
  6. Use --headed for debugging: See what the browser sees
  7. Check console for errors:
    agent-browser console
    reveals issues
  1. 始终先获取快照:在进行交互操作前获取当前页面状态
  2. 使用交互式模式(-i):仅显示可点击/可填充的元素
  3. 合理使用等待操作:页面导航后使用
    --load networkidle
    等待
  4. DOM变化后重新快照:DOM更新会导致原有引用标识失效
  5. 保存认证状态:避免重复登录操作
  6. 调试时使用--headed模式:直观查看浏览器操作过程
  7. 检查控制台错误:通过
    agent-browser console
    排查问题

When to Use This Skill

本技能适用场景

  • Testing web applications end-to-end
  • Automating repetitive web tasks
  • Scraping data from websites
  • Debugging frontend issues
  • Taking screenshots for documentation
  • Recording demo videos
  • Verifying UI functionality
  • Filling out forms programmatically
  • Web应用程序端到端测试
  • 重复性网页任务自动化
  • 网站数据爬取
  • 前端问题调试
  • 为文档截取网页截图
  • 录制操作演示视频
  • 验证UI功能
  • 程序化填写表单

Disable conflicting MCPs

禁用冲突的MCP

If Chrome DevTools or Playwright MCP is enabled, ask the user if they want to disable it to save context. This skill covers the same functionality.
Ask them to run:
/mcp
If they see chrome-devtools or playwright listed, suggest they remove it:
/mcp remove chrome-devtools
/mcp remove playwright
Don't run these commands yourself. Let the user decide whether to disable the MCP.
如果已启用Chrome DevTools或Playwright MCP,请询问用户是否需要禁用以节省上下文资源。本技能已覆盖其相关功能。
请让用户执行以下命令:
/mcp
如果列表中包含chrome-devtools或playwright,建议用户执行以下命令移除:
/mcp remove chrome-devtools
/mcp remove playwright
请勿自行执行上述命令,请让用户自行决定是否禁用相关MCP。