browser-navigation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBrowser Automation with agent-browser
基于agent-browser的浏览器自动化
Comprehensive browser automation for testing, data extraction, and web interaction.
实现全面的浏览器自动化,适用于测试、数据提取和网页交互场景。
Quick Start
快速开始
bash
agent-browser open <url> # Navigate to page
agent-browser snapshot -i # Get interactive elements with refs
agent-browser click @e1 # Click element by ref
agent-browser fill @e2 "text" # Fill input by ref
agent-browser close # Close browserbash
agent-browser open <url> # 导航至指定页面
agent-browser snapshot -i # 获取带引用标识的交互式元素
agent-browser click @e1 # 通过引用标识点击元素
agent-browser fill @e2 "text" # 通过引用标识填充输入框
agent-browser close # 关闭浏览器Core Workflow
核心工作流
- Navigate:
agent-browser open <url> - Snapshot: (returns elements with refs like
agent-browser snapshot -i,@e1)@e2 - Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
- 导航:
agent-browser open <url> - 快照:(返回带
agent-browser snapshot -i、@e1这类引用标识的元素)@e2 - 交互:使用快照返回的引用标识进行操作
- 重新快照:在页面导航或DOM发生重大变化后重新获取快照
Installation
安装
bash
undefinedbash
undefinedInstall globally
全局安装
npm install -g agent-browser
npm install -g agent-browser
Or use via npx
或通过npx直接使用
npx agent-browser open https://example.com
undefinednpx agent-browser open https://example.com
undefinedCommands Reference
命令参考
Navigation
导航
bash
agent-browser open <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browserbash
agent-browser open <url> # 导航至指定URL
agent-browser back # 返回上一页
agent-browser forward # 前进到下一页
agent-browser reload # 重新加载页面
agent-browser close # 关闭浏览器Page Analysis (Snapshot)
页面分析(快照)
bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (RECOMMENDED)
agent-browser snapshot -c # Compact output
agent-browser snapshot -d 3 # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selectorbash
agent-browser snapshot # 获取完整可访问性树
agent-browser snapshot -i # 仅获取交互式元素(推荐使用)
agent-browser snapshot -c # 精简输出格式
agent-browser snapshot -d 3 # 限制输出深度为3
agent-browser snapshot -s "#main" # 仅返回匹配CSS选择器的内容Interactions (Use @refs from Snapshot)
交互操作(使用快照中的@引用标识)
bash
undefinedbash
undefinedClicking
点击操作
agent-browser click @e1 # Click
agent-browser dblclick @e1 # Double-click
agent-browser hover @e1 # Hover
agent-browser click @e1 # 单击
agent-browser dblclick @e1 # 双击
agent-browser hover @e1 # 悬停
Focus
焦点操作
agent-browser focus @e1 # Focus element
agent-browser focus @e1 # 聚焦元素
Text Input
文本输入
agent-browser fill @e2 "text" # Clear and type
agent-browser type @e2 "text" # Type without clearing
agent-browser fill @e2 "text" # 清空后输入文本
agent-browser type @e2 "text" # 直接输入文本(不清空原有内容)
Keyboard
键盘操作
agent-browser press Enter # Press key
agent-browser press Control+a # Key combination
agent-browser keydown Shift # Hold key down
agent-browser keyup Shift # Release key
agent-browser press Enter # 按下指定按键
agent-browser press Control+a # 按下组合键
agent-browser keydown Shift # 按住按键
agent-browser keyup Shift # 松开按键
Forms
表单操作
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck checkbox
agent-browser select @e1 "value" # Select dropdown option
agent-browser check @e1 # 勾选复选框
agent-browser uncheck @e1 # 取消勾选复选框
agent-browser select @e1 "value" # 选择下拉选项
Scrolling
滚动操作
agent-browser scroll down 500 # Scroll page
agent-browser scrollintoview @e1 # Scroll element into view
agent-browser scroll down 500 # 向下滚动页面500像素
agent-browser scrollintoview @e1 # 滚动至元素可见
Other
其他操作
agent-browser drag @e1 @e2 # Drag and drop
agent-browser upload @e1 file.pdf # Upload files
undefinedagent-browser drag @e1 @e2 # 拖拽元素
agent-browser upload @e1 file.pdf # 上传文件
undefinedGetting Information
信息获取
bash
agent-browser get text @e1 # Get element text
agent-browser get html @e1 # Get innerHTML
agent-browser get value @e1 # Get input value
agent-browser get attr @e1 href # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count ".item" # Count matching elements
agent-browser get box @e1 # Get bounding boxbash
agent-browser get text @e1 # 获取元素文本内容
agent-browser get html @e1 # 获取元素innerHTML
agent-browser get value @e1 # 获取输入框值
agent-browser get attr @e1 href # 获取元素属性
agent-browser get title # 获取页面标题
agent-browser get url # 获取当前页面URL
agent-browser get count ".item" # 统计匹配元素数量
agent-browser get box @e1 # 获取元素边界框信息Checking State
状态检查
bash
agent-browser is visible @e1 # Check if visible
agent-browser is enabled @e1 # Check if enabled
agent-browser is checked @e1 # Check if checkedbash
agent-browser is visible @e1 # 检查元素是否可见
agent-browser is enabled @e1 # 检查元素是否可用
agent-browser is checked @e1 # 检查复选框是否已勾选Screenshots & PDF
截图与PDF导出
bash
agent-browser screenshot # Screenshot to stdout
agent-browser screenshot path.png # Save to file
agent-browser screenshot --full # Full page
agent-browser pdf output.pdf # Save as PDFbash
agent-browser screenshot # 将截图输出至标准输出
agent-browser screenshot path.png # 将截图保存至指定文件
agent-browser screenshot --full # 截取整页
agent-browser pdf output.pdf # 将页面保存为PDFVideo Recording
视频录制
bash
agent-browser record start ./demo.webm # Start recording
agent-browser click @e1 # Perform actions
agent-browser record stop # Stop and save video
agent-browser record restart ./take2.webm # Stop current + start newbash
agent-browser record start ./demo.webm # 开始录制视频
agent-browser click @e1 # 执行操作
agent-browser record stop # 停止录制并保存视频
agent-browser record restart ./take2.webm # 停止当前录制并开始新的录制Waiting
等待操作
bash
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Success" # Wait for text
agent-browser wait --url "**/dashboard" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --fn "window.ready" # Wait for JS conditionbash
agent-browser wait @e1 # 等待元素出现
agent-browser wait 2000 # 等待指定毫秒数
agent-browser wait --text "Success" # 等待指定文本出现
agent-browser wait --url "**/dashboard" # 等待URL匹配指定模式
agent-browser wait --load networkidle # 等待网络空闲
agent-browser wait --fn "window.ready" # 等待JavaScript条件满足Cookies & Storage
Cookie与存储
bash
agent-browser cookies # Get all cookies
agent-browser cookies set name value # Set cookie
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local key # Get specific key
agent-browser storage local set k v # Set value
agent-browser storage local clear # Clear allbash
agent-browser cookies # 获取所有Cookie
agent-browser cookies set name value # 设置Cookie
agent-browser cookies clear # 清空所有Cookie
agent-browser storage local # 获取所有localStorage内容
agent-browser storage local key # 获取localStorage中指定键的值
agent-browser storage local set k v # 设置localStorage中的键值对
agent-browser storage local clear # 清空所有localStorage内容Network
网络操作
bash
agent-browser network route <url> # Intercept requests
agent-browser network route <url> --abort # Block requests
agent-browser network route <url> --body '{}' # Mock response
agent-browser network unroute [url] # Remove routes
agent-browser network requests # View tracked requests
agent-browser network requests --filter api # Filter requestsbash
agent-browser network route <url> # 拦截指定请求
agent-browser network route <url> --abort # 阻止指定请求
agent-browser network route <url> --body '{}' # 模拟请求响应
agent-browser network unroute [url] # 移除请求路由规则
agent-browser network requests # 查看已跟踪的请求
agent-browser network requests --filter api # 过滤请求Browser Settings
浏览器设置
bash
agent-browser set viewport 1920 1080 # Set viewport size
agent-browser set device "iPhone 14" # Emulate device
agent-browser set geo 37.7749 -122.4194 # Set geolocation
agent-browser set offline on # Toggle offline mode
agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
agent-browser set credentials user pass # HTTP basic auth
agent-browser set media dark # Emulate color schemebash
agent-browser set viewport 1920 1080 # 设置视口尺寸
agent-browser set device "iPhone 14" # 模拟指定设备
agent-browser set geo 37.7749 -122.4194 # 设置地理位置
agent-browser set offline on # 切换离线模式
agent-browser set headers '{"X-Key":"v"}' # 设置额外HTTP请求头
agent-browser set credentials user pass # 设置HTTP基础认证信息
agent-browser set media dark # 模拟深色配色方案Tabs & Windows
标签页与窗口
bash
agent-browser tab # List tabs
agent-browser tab new [url] # New tab
agent-browser tab 2 # Switch to tab
agent-browser tab close # Close tab
agent-browser window new # New windowbash
agent-browser tab # 列出所有标签页
agent-browser tab new [url] # 新建标签页
agent-browser tab 2 # 切换至指定标签页
agent-browser tab close # 关闭当前标签页
agent-browser window new # 新建浏览器窗口Frames
框架操作
bash
agent-browser frame "#iframe" # Switch to iframe
agent-browser frame main # Back to main framebash
agent-browser frame "#iframe" # 切换至指定iframe
agent-browser frame main # 返回主框架Dialogs
对话框操作
bash
agent-browser dialog accept [text] # Accept dialog
agent-browser dialog dismiss # Dismiss dialogbash
agent-browser dialog accept [text] # 确认对话框(可传入文本)
agent-browser dialog dismiss # 取消对话框JavaScript Execution
JavaScript执行
bash
agent-browser eval "document.title" # Run JavaScriptbash
agent-browser eval "document.title" # 执行指定JavaScript代码Example Workflows
示例工作流
Form Submission
表单提交
bash
agent-browser open https://example.com/form
agent-browser snapshot -ibash
agent-browser open https://example.com/form
agent-browser snapshot -iOutput shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
输出内容包含:文本框"Email" [引用标识=e1]、文本框"Password" [引用标识=e2]、按钮"Submit" [引用标识=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
undefinedagent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # 检查操作结果
undefinedAuthentication with Saved State
基于已保存状态的认证
bash
undefinedbash
undefinedLogin once
首次登录
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
Later sessions: load saved state
后续会话:加载已保存的状态
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
undefinedagent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
undefinedScraping Data
数据爬取
bash
agent-browser open https://example.com/products
agent-browser snapshot -i --json > page_structure.jsonbash
agent-browser open https://example.com/products
agent-browser snapshot -i --json > page_structure.jsonGet specific data
获取指定数据
agent-browser get text @e5 # Product title
agent-browser get attr @e6 href # Product link
undefinedagent-browser get text @e5 # 获取产品标题
agent-browser get attr @e6 href # 获取产品链接
undefinedTaking Full Page Screenshot
截取整页截图
bash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full fullpage.pngbash
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full fullpage.pngTesting Login Flow
登录流程测试
bash
undefinedbash
undefinedNavigate to login
导航至登录页面
agent-browser open https://app.example.com/login
agent-browser open https://app.example.com/login
Take initial snapshot
获取初始快照
agent-browser snapshot -i
agent-browser snapshot -i
Fill credentials
填写认证信息
agent-browser fill @e1 "test@example.com"
agent-browser fill @e2 "testpassword"
agent-browser fill @e1 "test@example.com"
agent-browser fill @e2 "testpassword"
Click login
点击登录按钮
agent-browser click @e3
agent-browser click @e3
Wait for redirect
等待页面跳转
agent-browser wait --url "**/dashboard"
agent-browser wait --url "**/dashboard"
Verify logged in
验证是否登录成功
agent-browser get text @e10 # Should show username
undefinedagent-browser get text @e10 # 应显示用户名
undefinedDebugging
调试
bash
agent-browser open example.com --headed # Show browser window
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight @e1 # Highlight element
agent-browser trace start # Start recording trace
agent-browser trace stop trace.zip # Stop and save trace
agent-browser --cdp 9222 snapshot # Connect via CDPbash
agent-browser open example.com --headed # 显示浏览器窗口
agent-browser console # 查看控制台消息
agent-browser console --clear # 清空控制台
agent-browser errors # 查看页面错误
agent-browser errors --clear # 清空错误记录
agent-browser highlight @e1 # 高亮显示指定元素
agent-browser trace start # 开始录制跟踪信息
agent-browser trace stop trace.zip # 停止录制并保存跟踪文件
agent-browser --cdp 9222 snapshot # 通过CDP连接浏览器Sessions (Parallel Browsers)
会话管理(多并行浏览器)
bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session listbash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session listJSON Output
JSON格式输出
Add for machine-readable output:
--jsonbash
agent-browser snapshot -i --json
agent-browser get text @e1 --json添加参数可获取机器可读的输出:
--jsonbash
agent-browser snapshot -i --json
agent-browser get text @e1 --jsonSemantic Locators (Alternative to Refs)
语义定位(替代引用标识)
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" textbash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" textBest Practices
最佳实践
- Always snapshot first: Get the current page state before interacting
- Use interactive mode (-i): Shows only clickable/fillable elements
- Wait appropriately: Use after navigation
--load networkidle - Re-snapshot after changes: DOM updates invalidate refs
- Save authentication state: Avoid repeated logins
- Use --headed for debugging: See what the browser sees
- Check console for errors: reveals issues
agent-browser console
- 始终先获取快照:在进行交互操作前获取当前页面状态
- 使用交互式模式(-i):仅显示可点击/可填充的元素
- 合理使用等待操作:页面导航后使用等待
--load networkidle - DOM变化后重新快照:DOM更新会导致原有引用标识失效
- 保存认证状态:避免重复登录操作
- 调试时使用--headed模式:直观查看浏览器操作过程
- 检查控制台错误:通过排查问题
agent-browser console
When to Use This Skill
本技能适用场景
- Testing web applications end-to-end
- Automating repetitive web tasks
- Scraping data from websites
- Debugging frontend issues
- Taking screenshots for documentation
- Recording demo videos
- Verifying UI functionality
- Filling out forms programmatically
- Web应用程序端到端测试
- 重复性网页任务自动化
- 网站数据爬取
- 前端问题调试
- 为文档截取网页截图
- 录制操作演示视频
- 验证UI功能
- 程序化填写表单
Disable conflicting MCPs
禁用冲突的MCP
If Chrome DevTools or Playwright MCP is enabled, ask the user if they want to disable it to save context. This skill covers the same functionality.
Ask them to run:
/mcpIf they see chrome-devtools or playwright listed, suggest they remove it:
/mcp remove chrome-devtools
/mcp remove playwrightDon't run these commands yourself. Let the user decide whether to disable the MCP.
如果已启用Chrome DevTools或Playwright MCP,请询问用户是否需要禁用以节省上下文资源。本技能已覆盖其相关功能。
请让用户执行以下命令:
/mcp如果列表中包含chrome-devtools或playwright,建议用户执行以下命令移除:
/mcp remove chrome-devtools
/mcp remove playwright请勿自行执行上述命令,请让用户自行决定是否禁用相关MCP。