agent-browser-cli-control
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseagent-browser-cli Control
agent-browser-cli 控制工具
Skill by ara.so — Devtools Skills collection.
agent-browser-cli is a browser control CLI built in Rust that connects to a real Chrome session via a Chrome extension. It enables AI agents to scan tabs, execute JavaScript, manage cookies, capture screenshots, and perform CDP operations while preserving the user's login state and session.
Key difference from Selenium/Playwright: This tool works with an existing Chrome session (not headless), preserving all user login states and cookies. Perfect for AI agents that need to work with authenticated web sessions.
由ara.so开发的技能工具——开发者工具技能合集。
agent-browser-cli是一款基于Rust构建的浏览器控制CLI工具,通过Chrome扩展程序连接到真实的Chrome会话。它允许AI Agent扫描标签页、执行JavaScript、管理Cookie、捕获截图并执行CDP操作,同时保留用户的登录状态和会话信息。
与Selenium/Playwright的核心区别:本工具可与已有的Chrome会话(非无头模式)配合使用,保留所有用户登录状态和Cookie。非常适合需要处理已认证Web会话的AI Agent。
Installation
安装
Via npm (recommended)
通过npm安装(推荐方式)
bash
npm install -g @sleepinsummer/agent-browser-clibash
npm install -g @sleepinsummer/agent-browser-cliVia source
通过源码安装
bash
git clone https://github.com/sleepinginsummer/agent-browser-cli.git
cd agent-browser-cli
cargo build --releasebash
git clone https://github.com/sleepinginsummer/agent-browser-cli.git
cd agent-browser-cli
cargo build --releaseBinary will be at ./target/release/agent-browser-cli
二进制文件路径为 ./target/release/agent-browser-cli
undefinedundefinedChrome Extension Setup
Chrome扩展程序设置
Critical: The Chrome extension must be loaded for the CLI to work.
- Download from the latest release
chrome-extensions.zip - Extract the zip file
- Open Chrome and navigate to
chrome://extensions - Enable "Developer mode"
- Click "Load unpacked extension"
- Select the extracted directory
tmwd_cdp_bridge - Ensure at least one normal web page tab is open (not or
about:blank)chrome://
The extension will show a small indicator on the right side of the page when connected.
重要提示:必须加载Chrome扩展程序,CLI工具才能正常工作。
- 从最新版本页面下载压缩包
chrome-extensions.zip - 解压该压缩包
- 打开Chrome浏览器,导航至页面
chrome://extensions - 启用“开发者模式”
- 点击“加载已解压的扩展程序”
- 选择解压后的目录
tmwd_cdp_bridge - 确保至少打开一个普通网页标签页(非或
about:blank页面)chrome://
当扩展程序连接成功后,会在页面右侧显示一个小指示器。
Architecture
架构说明
agent-browser-cli runs as a daemon service that:
- Listens on port for Chrome extension WebSocket connections
18765 - Listens on port for CLI HTTP API calls
18767 - Maintains persistent browser connection to avoid re-initialization overhead
Performance reference (with daemon running):
- Simple page read / JS execution: 40-120ms
- DOM query operations: 270-360ms
- Page monitoring with change summary: 720-880ms
agent-browser-cli以守护进程服务形式运行,具备以下功能:
- 监听端口以接收Chrome扩展程序的WebSocket连接
18765 - 监听端口以接收CLI的HTTP API调用
18767 - 维持与浏览器的持久连接,避免重复初始化带来的开销
性能参考(守护进程运行时):
- 简单页面读取/JavaScript执行:40-120毫秒
- DOM查询操作:270-360毫秒
- 带变更汇总的页面监控:720-880毫秒
Core Commands
核心命令
Daemon Management
守护进程管理
bash
undefinedbash
undefinedStart daemon (auto-starts if not running)
启动守护进程(若未运行则自动启动)
agent-browser-cli start
agent-browser-cli start
Stop daemon
停止守护进程
agent-browser-cli stop
agent-browser-cli stop
Restart daemon
重启守护进程
agent-browser-cli restart
agent-browser-cli restart
Check daemon status
检查守护进程状态
agent-browser-cli status
undefinedagent-browser-cli status
undefinedTab Management
标签页管理
bash
undefinedbash
undefinedList all tabs
列出所有标签页
agent-browser-cli tabs
agent-browser-cli tabs
Switch to tab by index
通过索引切换标签页
agent-browser-cli switch 0
agent-browser-cli switch 0
Open new URL
打开新URL
agent-browser-cli open https://example.com
agent-browser-cli open https://example.com
Close current tab
关闭当前标签页
agent-browser-cli close
agent-browser-cli close
Get active tab info
获取活跃标签页信息
agent-browser-cli active
undefinedagent-browser-cli active
undefinedPage Content Extraction
页面内容提取
bash
undefinedbash
undefinedScan current page (simplified HTML)
扫描当前页面(简化HTML)
agent-browser-cli scan
agent-browser-cli scan
Get text-only content (faster)
获取纯文本内容(速度更快)
agent-browser-cli scan --text-only
agent-browser-cli scan --text-only
Get full page DOM
获取完整页面DOM
agent-browser-cli scan --full
agent-browser-cli scan --full
Scan specific tab
扫描指定标签页
agent-browser-cli scan --tab 0
agent-browser-cli scan --tab 0
Get page with screenshots of visible elements
获取包含可见元素截图的页面内容
agent-browser-cli scan --screenshot
undefinedagent-browser-cli scan --screenshot
undefinedJavaScript Execution
JavaScript执行
bash
undefinedbash
undefinedExecute JavaScript and return result
执行JavaScript并返回结果
agent-browser-cli exec 'return document.title'
agent-browser-cli exec 'return document.title'
Execute with page monitoring (detects changes)
执行并监控页面(检测变更)
agent-browser-cli exec --monitor 'document.querySelector("#search").value = "test"'
agent-browser-cli exec --monitor 'document.querySelector("#search").value = "test"'
Multi-line JS execution
多行JavaScript执行
agent-browser-cli exec 'const btn = document.querySelector("button");
btn.click();
return "clicked";'
agent-browser-cli exec 'const btn = document.querySelector("button");
btn.click();
return "clicked";'
Execute in specific tab
在指定标签页执行
agent-browser-cli exec --tab 0 'return window.location.href'
undefinedagent-browser-cli exec --tab 0 'return window.location.href'
undefinedScreenshots
截图功能
bash
undefinedbash
undefinedCapture full page screenshot
捕获全屏截图
agent-browser-cli screenshot
agent-browser-cli screenshot
Screenshot specific tab
捕获指定标签页截图
agent-browser-cli screenshot --tab 0
agent-browser-cli screenshot --tab 0
Output to specific file
输出到指定文件
agent-browser-cli screenshot --output /path/to/screenshot.png
undefinedagent-browser-cli screenshot --output /path/to/screenshot.png
undefinedCookie Management
Cookie管理
bash
undefinedbash
undefinedGet all cookies for current page
获取当前页面的所有Cookie
agent-browser-cli cookies
agent-browser-cli cookies
Get cookies for specific domain
获取指定域名的Cookie
agent-browser-cli cookies --domain example.com
agent-browser-cli cookies --domain example.com
Set cookie
设置Cookie
agent-browser-cli cookies --set 'name=value; domain=.example.com; path=/'
undefinedagent-browser-cli cookies --set 'name=value; domain=.example.com; path=/'
undefinedConfiguration
配置设置
bash
undefinedbash
undefinedChange extension WebSocket port (default: 18765)
修改扩展程序WebSocket端口(默认:18765)
agent-browser-cli set-extension-port 18766
agent-browser-cli set-extension-port 18766
View current config
查看当前配置
cat ~/.agent-browser-cli/config.json
undefinedcat ~/.agent-browser-cli/config.json
undefinedReal-World Examples
实际应用示例
Example 1: Search Automation
示例1:搜索自动化
bash
undefinedbash
undefinedOpen search engine
打开搜索引擎
agent-browser-cli open "https://www.google.com"
agent-browser-cli open "https://www.google.com"
Wait a moment for page load, then input search query
等待页面加载完成,然后输入搜索关键词
sleep 1
agent-browser-cli exec 'document.querySelector("textarea[name=q]").value = "rust programming"'
sleep 1
agent-browser-cli exec 'document.querySelector("textarea[name=q]").value = "rust programming"'
Submit search
提交搜索
agent-browser-cli exec --monitor 'document.querySelector("textarea[name=q]").form.submit()'
agent-browser-cli exec --monitor 'document.querySelector("textarea[name=q]").form.submit()'
Extract search results
提取搜索结果
agent-browser-cli scan --text-only
undefinedagent-browser-cli scan --text-only
undefinedExample 2: Form Filling
示例2:表单填充
bash
undefinedbash
undefinedNavigate to form page
导航至表单页面
agent-browser-cli open "https://example.com/contact"
agent-browser-cli open "https://example.com/contact"
Fill form fields
填写表单字段
agent-browser-cli exec '
const form = {
name: document.querySelector("#name"),
email: document.querySelector("#email"),
message: document.querySelector("#message")
};
form.name.value = "AI Agent";
form.email.value = "agent@example.com";
form.message.value = "Hello from agent-browser-cli";
return "Form filled";
'
agent-browser-cli exec '
const form = {
name: document.querySelector("#name"),
email: document.querySelector("#email"),
message: document.querySelector("#message")
};
form.name.value = "AI Agent";
form.email.value = "agent@example.com";
form.message.value = "Hello from agent-browser-cli";
return "Form filled";
'
Submit and monitor changes
提交表单并监控变更
agent-browser-cli exec --monitor 'document.querySelector("form").submit()'
undefinedagent-browser-cli exec --monitor 'document.querySelector("form").submit()'
undefinedExample 3: Multi-Tab Workflow
示例3:多标签页工作流
bash
undefinedbash
undefinedGet list of all tabs
获取所有标签页列表
TABS=$(agent-browser-cli tabs)
echo "$TABS"
TABS=$(agent-browser-cli tabs)
echo "$TABS"
Switch to first tab
切换到第一个标签页
agent-browser-cli switch 0
agent-browser-cli switch 0
Get current page title
获取当前页面标题
agent-browser-cli exec 'return document.title'
agent-browser-cli exec 'return document.title'
Open new tab
打开新标签页
agent-browser-cli open "https://example.com"
agent-browser-cli open "https://example.com"
Work with new tab
操作新标签页
agent-browser-cli scan --text-only
undefinedagent-browser-cli scan --text-only
undefinedExample 4: Data Extraction with Cookies
示例4:带Cookie的数据提取
bash
undefinedbash
undefinedNavigate to authenticated page
导航至已认证页面
agent-browser-cli open "https://example.com/dashboard"
agent-browser-cli open "https://example.com/dashboard"
Extract cookies (user is already logged in)
提取Cookie(用户已登录)
COOKIES=$(agent-browser-cli cookies --domain example.com)
echo "$COOKIES"
COOKIES=$(agent-browser-cli cookies --domain example.com)
echo "$COOKIES"
Extract protected content
提取受保护内容
agent-browser-cli scan --text-only
agent-browser-cli scan --text-only
Take screenshot of dashboard
捕获仪表盘截图
agent-browser-cli screenshot --output dashboard.png
undefinedagent-browser-cli screenshot --output dashboard.png
undefinedExample 5: Page Monitoring
示例5:页面监控
bash
undefinedbash
undefinedExecute action and monitor DOM changes
执行操作并监控DOM变更
RESULT=$(agent-browser-cli exec --monitor '
document.querySelector("#load-more").click();
return "Clicked load more";
')
RESULT=$(agent-browser-cli exec --monitor '
document.querySelector("#load-more").click();
return "Clicked load more";
')
The result includes a change summary
结果包含变更汇总信息
echo "$RESULT"
echo "$RESULT"
Re-scan to get updated content
重新扫描获取更新后的内容
agent-browser-cli scan --text-only
undefinedagent-browser-cli scan --text-only
undefinedResponse Format
响应格式
All commands return JSON:
json
{
"ok": true,
"result": {
"status": "success",
"data": "...",
"metadata": {}
}
}Error format:
json
{
"ok": false,
"error": "Error message"
}所有命令均返回JSON格式数据:
json
{
"ok": true,
"result": {
"status": "success",
"data": "...",
"metadata": {}
}
}错误格式:
json
{
"ok": false,
"error": "Error message"
}Common Patterns
常用模式
Wait for Element
等待元素加载
bash
agent-browser-cli exec '
return new Promise((resolve) => {
const check = () => {
const el = document.querySelector("#target");
if (el) resolve("Found");
else setTimeout(check, 100);
};
check();
});
'bash
agent-browser-cli exec '
return new Promise((resolve) => {
const check = () => {
const el = document.querySelector("#target");
if (el) resolve("Found");
else setTimeout(check, 100);
};
check();
});
'Extract Structured Data
提取结构化数据
bash
agent-browser-cli exec '
return Array.from(document.querySelectorAll(".item")).map(item => ({
title: item.querySelector(".title")?.textContent,
link: item.querySelector("a")?.href,
price: item.querySelector(".price")?.textContent
}));
'bash
agent-browser-cli exec '
return Array.from(document.querySelectorAll(".item")).map(item => ({
title: item.querySelector(".title")?.textContent,
link: item.querySelector("a")?.href,
price: item.querySelector(".price")?.textContent
}));
'File Upload
文件上传
bash
agent-browser-cli exec '
const input = document.querySelector("input[type=file]");
const dt = new DataTransfer();
dt.items.add(new File(["content"], "test.txt"));
input.files = dt.files;
input.dispatchEvent(new Event("change", { bubbles: true }));
return "File set";
'bash
agent-browser-cli exec '
const input = document.querySelector("input[type=file]");
const dt = new DataTransfer();
dt.items.add(new File(["content"], "test.txt"));
input.files = dt.files;
input.dispatchEvent(new Event("change", { bubbles: true }));
return "File set";
'Dropdown Selection
下拉菜单选择
bash
agent-browser-cli exec '
const select = document.querySelector("select#country");
select.value = "US";
select.dispatchEvent(new Event("change", { bubbles: true }));
return select.value;
'bash
agent-browser-cli exec '
const select = document.querySelector("select#country");
select.value = "US";
select.dispatchEvent(new Event("change", { bubbles: true }));
return select.value;
'Troubleshooting
故障排除
Extension not connecting
扩展程序无法连接
- Verify extension is loaded in
chrome://extensions - Check extension port matches config:
cat ~/.agent-browser-cli/config.json - Ensure at least one normal web page is open (not or
chrome://)about: - Restart daemon:
agent-browser-cli restart
- 验证扩展程序已在页面加载
chrome://extensions - 检查扩展程序端口与配置是否匹配:
cat ~/.agent-browser-cli/config.json - 确保至少打开一个普通网页(非或
chrome://页面)about: - 重启守护进程:
agent-browser-cli restart
Command timeout
命令超时
- Check daemon status:
agent-browser-cli status - Verify Chrome is running and extension is active
- Try simpler command first:
agent-browser-cli tabs - Check logs:
tail -f ~/.agent-browser-cli.log
- 检查守护进程状态:
agent-browser-cli status - 验证Chrome浏览器正在运行且扩展程序处于激活状态
- 先尝试执行简单命令:
agent-browser-cli tabs - 查看日志:
tail -f ~/.agent-browser-cli.log
Port conflicts
端口冲突
bash
undefinedbash
undefinedChange extension port
修改扩展程序端口
agent-browser-cli set-extension-port 18766
agent-browser-cli set-extension-port 18766
Update extension popup to match new port
更新扩展程序弹窗以匹配新端口
Restart daemon
重启守护进程
agent-browser-cli restart
undefinedagent-browser-cli restart
undefinedWSL 2 connectivity issues
WSL 2连接问题
For WSL 2 users on Windows 11 22H2+:
- Enable in
networkingMode=mirrored.wslconfig - Restart WSL
- Ensure Chrome extension can connect to
localhost:18765
针对Windows 11 22H2+版本的WSL 2用户:
- 在文件中启用
.wslconfignetworkingMode=mirrored - 重启WSL
- 确保Chrome扩展程序可以连接到
localhost:18765
Slow performance
性能缓慢
- Use flag for faster content extraction
--text-only - Avoid unless needed
--screenshot - Use only when detecting page changes is necessary
--monitor - Consider increasing system resources if Chrome is sluggish
- 使用标志以加快内容提取速度
--text-only - 除非必要,否则避免使用
--screenshot - 仅在需要检测页面变更时使用
--monitor - 如果Chrome运行卡顿,可考虑增加系统资源
Advanced: CDP Operations
进阶功能:CDP操作
The underlying Chrome extension uses Chrome DevTools Protocol (CDP). While most operations are abstracted by the CLI, you can execute raw CDP commands via JavaScript:
bash
agent-browser-cli exec '
return await chrome.debugger.sendCommand({tabId: chrome.tabs.getCurrent().id}, "Network.getCookies", {});
'底层Chrome扩展程序使用Chrome DevTools Protocol (CDP)。虽然大多数操作已通过CLI封装,但你可以通过JavaScript执行原始CDP命令:
bash
agent-browser-cli exec '
return await chrome.debugger.sendCommand({tabId: chrome.tabs.getCurrent().id}, "Network.getCookies", {});
'Environment Variables
环境变量
The CLI uses a config file instead of environment variables:
Config location:
~/.agent-browser-cli/config.jsonjson
{
"extension_port": 18765
}CLI工具使用配置文件而非环境变量:
配置文件路径:
~/.agent-browser-cli/config.jsonjson
{
"extension_port": 18765
}Logs
日志
Daemon logs are written to:
~/.agent-browser-cli.logMonitor in real-time:
bash
tail -f ~/.agent-browser-cli.log守护进程日志写入以下路径:
~/.agent-browser-cli.log实时监控日志:
bash
tail -f ~/.agent-browser-cli.logIntegration with AI Agents
与AI Agent集成
When called by AI coding agents, typical workflows:
- Check tab state:
agent-browser-cli tabs - Navigate or switch: or
agent-browser-cli open <url>agent-browser-cli switch <index> - Extract content:
agent-browser-cli scan --text-only - Perform action:
agent-browser-cli exec --monitor '<js code>' - Verify result: or
agent-browser-cli scanagent-browser-cli screenshot
The tool is designed for high-frequency calls with minimal overhead (~40-120ms for simple operations).
当被AI编码Agent调用时,典型工作流如下:
- 检查标签页状态:
agent-browser-cli tabs - 导航或切换标签页:或
agent-browser-cli open <url>agent-browser-cli switch <index> - 提取内容:
agent-browser-cli scan --text-only - 执行操作:
agent-browser-cli exec --monitor '<js代码>' - 验证结果:或
agent-browser-cli scanagent-browser-cli screenshot
本工具专为高频调用设计,开销极低(简单操作约40-120毫秒)。