agent-browser-cli-control

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

agent-browser-cli Control

agent-browser-cli 控制工具

Skill by ara.so — Devtools Skills collection.
agent-browser-cli is a browser control CLI built in Rust that connects to a real Chrome session via a Chrome extension. It enables AI agents to scan tabs, execute JavaScript, manage cookies, capture screenshots, and perform CDP operations while preserving the user's login state and session.
Key difference from Selenium/Playwright: This tool works with an existing Chrome session (not headless), preserving all user login states and cookies. Perfect for AI agents that need to work with authenticated web sessions.
ara.so开发的技能工具——开发者工具技能合集。
agent-browser-cli是一款基于Rust构建的浏览器控制CLI工具,通过Chrome扩展程序连接到真实的Chrome会话。它允许AI Agent扫描标签页、执行JavaScript、管理Cookie、捕获截图并执行CDP操作,同时保留用户的登录状态和会话信息。
与Selenium/Playwright的核心区别:本工具可与已有的Chrome会话(非无头模式)配合使用,保留所有用户登录状态和Cookie。非常适合需要处理已认证Web会话的AI Agent。

Installation

安装

Via npm (recommended)

通过npm安装(推荐方式)

bash
npm install -g @sleepinsummer/agent-browser-cli
bash
npm install -g @sleepinsummer/agent-browser-cli

Via source

通过源码安装

bash
git clone https://github.com/sleepinginsummer/agent-browser-cli.git
cd agent-browser-cli
cargo build --release
bash
git clone https://github.com/sleepinginsummer/agent-browser-cli.git
cd agent-browser-cli
cargo build --release

Binary will be at ./target/release/agent-browser-cli

二进制文件路径为 ./target/release/agent-browser-cli

undefined
undefined

Chrome Extension Setup

Chrome扩展程序设置

Critical: The Chrome extension must be loaded for the CLI to work.
  1. Download
    chrome-extensions.zip
    from the latest release
  2. Extract the zip file
  3. Open Chrome and navigate to
    chrome://extensions
  4. Enable "Developer mode"
  5. Click "Load unpacked extension"
  6. Select the extracted
    tmwd_cdp_bridge
    directory
  7. Ensure at least one normal web page tab is open (not
    about:blank
    or
    chrome://
    )
The extension will show a small indicator on the right side of the page when connected.
重要提示:必须加载Chrome扩展程序,CLI工具才能正常工作。
  1. 最新版本页面下载
    chrome-extensions.zip
    压缩包
  2. 解压该压缩包
  3. 打开Chrome浏览器,导航至
    chrome://extensions
    页面
  4. 启用“开发者模式”
  5. 点击“加载已解压的扩展程序”
  6. 选择解压后的
    tmwd_cdp_bridge
    目录
  7. 确保至少打开一个普通网页标签页(非
    about:blank
    chrome://
    页面)
当扩展程序连接成功后,会在页面右侧显示一个小指示器。

Architecture

架构说明

agent-browser-cli runs as a daemon service that:
  • Listens on port
    18765
    for Chrome extension WebSocket connections
  • Listens on port
    18767
    for CLI HTTP API calls
  • Maintains persistent browser connection to avoid re-initialization overhead
Performance reference (with daemon running):
  • Simple page read / JS execution: 40-120ms
  • DOM query operations: 270-360ms
  • Page monitoring with change summary: 720-880ms
agent-browser-cli以守护进程服务形式运行,具备以下功能:
  • 监听端口
    18765
    以接收Chrome扩展程序的WebSocket连接
  • 监听端口
    18767
    以接收CLI的HTTP API调用
  • 维持与浏览器的持久连接,避免重复初始化带来的开销
性能参考(守护进程运行时):
  • 简单页面读取/JavaScript执行:40-120毫秒
  • DOM查询操作:270-360毫秒
  • 带变更汇总的页面监控:720-880毫秒

Core Commands

核心命令

Daemon Management

守护进程管理

bash
undefined
bash
undefined

Start daemon (auto-starts if not running)

启动守护进程(若未运行则自动启动)

agent-browser-cli start
agent-browser-cli start

Stop daemon

停止守护进程

agent-browser-cli stop
agent-browser-cli stop

Restart daemon

重启守护进程

agent-browser-cli restart
agent-browser-cli restart

Check daemon status

检查守护进程状态

agent-browser-cli status
undefined
agent-browser-cli status
undefined

Tab Management

标签页管理

bash
undefined
bash
undefined

List all tabs

列出所有标签页

agent-browser-cli tabs
agent-browser-cli tabs

Switch to tab by index

通过索引切换标签页

agent-browser-cli switch 0
agent-browser-cli switch 0

Open new URL

打开新URL

agent-browser-cli open https://example.com
agent-browser-cli open https://example.com

Close current tab

关闭当前标签页

agent-browser-cli close
agent-browser-cli close

Get active tab info

获取活跃标签页信息

agent-browser-cli active
undefined
agent-browser-cli active
undefined

Page Content Extraction

页面内容提取

bash
undefined
bash
undefined

Scan current page (simplified HTML)

扫描当前页面(简化HTML)

agent-browser-cli scan
agent-browser-cli scan

Get text-only content (faster)

获取纯文本内容(速度更快)

agent-browser-cli scan --text-only
agent-browser-cli scan --text-only

Get full page DOM

获取完整页面DOM

agent-browser-cli scan --full
agent-browser-cli scan --full

Scan specific tab

扫描指定标签页

agent-browser-cli scan --tab 0
agent-browser-cli scan --tab 0

Get page with screenshots of visible elements

获取包含可见元素截图的页面内容

agent-browser-cli scan --screenshot
undefined
agent-browser-cli scan --screenshot
undefined

JavaScript Execution

JavaScript执行

bash
undefined
bash
undefined

Execute JavaScript and return result

执行JavaScript并返回结果

agent-browser-cli exec 'return document.title'
agent-browser-cli exec 'return document.title'

Execute with page monitoring (detects changes)

执行并监控页面(检测变更)

agent-browser-cli exec --monitor 'document.querySelector("#search").value = "test"'
agent-browser-cli exec --monitor 'document.querySelector("#search").value = "test"'

Multi-line JS execution

多行JavaScript执行

agent-browser-cli exec 'const btn = document.querySelector("button"); btn.click(); return "clicked";'
agent-browser-cli exec 'const btn = document.querySelector("button"); btn.click(); return "clicked";'

Execute in specific tab

在指定标签页执行

agent-browser-cli exec --tab 0 'return window.location.href'
undefined
agent-browser-cli exec --tab 0 'return window.location.href'
undefined

Screenshots

截图功能

bash
undefined
bash
undefined

Capture full page screenshot

捕获全屏截图

agent-browser-cli screenshot
agent-browser-cli screenshot

Screenshot specific tab

捕获指定标签页截图

agent-browser-cli screenshot --tab 0
agent-browser-cli screenshot --tab 0

Output to specific file

输出到指定文件

agent-browser-cli screenshot --output /path/to/screenshot.png
undefined
agent-browser-cli screenshot --output /path/to/screenshot.png
undefined

Cookie Management

Cookie管理

bash
undefined
bash
undefined

Get all cookies for current page

获取当前页面的所有Cookie

agent-browser-cli cookies
agent-browser-cli cookies

Get cookies for specific domain

获取指定域名的Cookie

agent-browser-cli cookies --domain example.com
agent-browser-cli cookies --domain example.com

Set cookie

设置Cookie

agent-browser-cli cookies --set 'name=value; domain=.example.com; path=/'
undefined
agent-browser-cli cookies --set 'name=value; domain=.example.com; path=/'
undefined

Configuration

配置设置

bash
undefined
bash
undefined

Change extension WebSocket port (default: 18765)

修改扩展程序WebSocket端口(默认:18765)

agent-browser-cli set-extension-port 18766
agent-browser-cli set-extension-port 18766

View current config

查看当前配置

cat ~/.agent-browser-cli/config.json
undefined
cat ~/.agent-browser-cli/config.json
undefined

Real-World Examples

实际应用示例

Example 1: Search Automation

示例1:搜索自动化

bash
undefined
bash
undefined

Open search engine

打开搜索引擎

agent-browser-cli open "https://www.google.com"
agent-browser-cli open "https://www.google.com"

Wait a moment for page load, then input search query

等待页面加载完成,然后输入搜索关键词

sleep 1 agent-browser-cli exec 'document.querySelector("textarea[name=q]").value = "rust programming"'
sleep 1 agent-browser-cli exec 'document.querySelector("textarea[name=q]").value = "rust programming"'

Submit search

提交搜索

agent-browser-cli exec --monitor 'document.querySelector("textarea[name=q]").form.submit()'
agent-browser-cli exec --monitor 'document.querySelector("textarea[name=q]").form.submit()'

Extract search results

提取搜索结果

agent-browser-cli scan --text-only
undefined
agent-browser-cli scan --text-only
undefined

Example 2: Form Filling

示例2:表单填充

bash
undefined
bash
undefined

Navigate to form page

导航至表单页面

agent-browser-cli open "https://example.com/contact"
agent-browser-cli open "https://example.com/contact"

Fill form fields

填写表单字段

agent-browser-cli exec ' const form = { name: document.querySelector("#name"), email: document.querySelector("#email"), message: document.querySelector("#message") }; form.name.value = "AI Agent"; form.email.value = "agent@example.com"; form.message.value = "Hello from agent-browser-cli"; return "Form filled"; '
agent-browser-cli exec ' const form = { name: document.querySelector("#name"), email: document.querySelector("#email"), message: document.querySelector("#message") }; form.name.value = "AI Agent"; form.email.value = "agent@example.com"; form.message.value = "Hello from agent-browser-cli"; return "Form filled"; '

Submit and monitor changes

提交表单并监控变更

agent-browser-cli exec --monitor 'document.querySelector("form").submit()'
undefined
agent-browser-cli exec --monitor 'document.querySelector("form").submit()'
undefined

Example 3: Multi-Tab Workflow

示例3:多标签页工作流

bash
undefined
bash
undefined

Get list of all tabs

获取所有标签页列表

TABS=$(agent-browser-cli tabs) echo "$TABS"
TABS=$(agent-browser-cli tabs) echo "$TABS"

Switch to first tab

切换到第一个标签页

agent-browser-cli switch 0
agent-browser-cli switch 0

Get current page title

获取当前页面标题

agent-browser-cli exec 'return document.title'
agent-browser-cli exec 'return document.title'

Open new tab

打开新标签页

agent-browser-cli open "https://example.com"
agent-browser-cli open "https://example.com"

Work with new tab

操作新标签页

agent-browser-cli scan --text-only
undefined
agent-browser-cli scan --text-only
undefined

Example 4: Data Extraction with Cookies

示例4:带Cookie的数据提取

bash
undefined
bash
undefined

Navigate to authenticated page

导航至已认证页面

agent-browser-cli open "https://example.com/dashboard"
agent-browser-cli open "https://example.com/dashboard"

Extract cookies (user is already logged in)

提取Cookie(用户已登录)

COOKIES=$(agent-browser-cli cookies --domain example.com) echo "$COOKIES"
COOKIES=$(agent-browser-cli cookies --domain example.com) echo "$COOKIES"

Extract protected content

提取受保护内容

agent-browser-cli scan --text-only
agent-browser-cli scan --text-only

Take screenshot of dashboard

捕获仪表盘截图

agent-browser-cli screenshot --output dashboard.png
undefined
agent-browser-cli screenshot --output dashboard.png
undefined

Example 5: Page Monitoring

示例5:页面监控

bash
undefined
bash
undefined

Execute action and monitor DOM changes

执行操作并监控DOM变更

RESULT=$(agent-browser-cli exec --monitor ' document.querySelector("#load-more").click(); return "Clicked load more"; ')
RESULT=$(agent-browser-cli exec --monitor ' document.querySelector("#load-more").click(); return "Clicked load more"; ')

The result includes a change summary

结果包含变更汇总信息

echo "$RESULT"
echo "$RESULT"

Re-scan to get updated content

重新扫描获取更新后的内容

agent-browser-cli scan --text-only
undefined
agent-browser-cli scan --text-only
undefined

Response Format

响应格式

All commands return JSON:
json
{
  "ok": true,
  "result": {
    "status": "success",
    "data": "...",
    "metadata": {}
  }
}
Error format:
json
{
  "ok": false,
  "error": "Error message"
}
所有命令均返回JSON格式数据:
json
{
  "ok": true,
  "result": {
    "status": "success",
    "data": "...",
    "metadata": {}
  }
}
错误格式:
json
{
  "ok": false,
  "error": "Error message"
}

Common Patterns

常用模式

Wait for Element

等待元素加载

bash
agent-browser-cli exec '
return new Promise((resolve) => {
  const check = () => {
    const el = document.querySelector("#target");
    if (el) resolve("Found");
    else setTimeout(check, 100);
  };
  check();
});
'
bash
agent-browser-cli exec '
return new Promise((resolve) => {
  const check = () => {
    const el = document.querySelector("#target");
    if (el) resolve("Found");
    else setTimeout(check, 100);
  };
  check();
});
'

Extract Structured Data

提取结构化数据

bash
agent-browser-cli exec '
return Array.from(document.querySelectorAll(".item")).map(item => ({
  title: item.querySelector(".title")?.textContent,
  link: item.querySelector("a")?.href,
  price: item.querySelector(".price")?.textContent
}));
'
bash
agent-browser-cli exec '
return Array.from(document.querySelectorAll(".item")).map(item => ({
  title: item.querySelector(".title")?.textContent,
  link: item.querySelector("a")?.href,
  price: item.querySelector(".price")?.textContent
}));
'

File Upload

文件上传

bash
agent-browser-cli exec '
const input = document.querySelector("input[type=file]");
const dt = new DataTransfer();
dt.items.add(new File(["content"], "test.txt"));
input.files = dt.files;
input.dispatchEvent(new Event("change", { bubbles: true }));
return "File set";
'
bash
agent-browser-cli exec '
const input = document.querySelector("input[type=file]");
const dt = new DataTransfer();
dt.items.add(new File(["content"], "test.txt"));
input.files = dt.files;
input.dispatchEvent(new Event("change", { bubbles: true }));
return "File set";
'

Dropdown Selection

下拉菜单选择

bash
agent-browser-cli exec '
const select = document.querySelector("select#country");
select.value = "US";
select.dispatchEvent(new Event("change", { bubbles: true }));
return select.value;
'
bash
agent-browser-cli exec '
const select = document.querySelector("select#country");
select.value = "US";
select.dispatchEvent(new Event("change", { bubbles: true }));
return select.value;
'

Troubleshooting

故障排除

Extension not connecting

扩展程序无法连接

  1. Verify extension is loaded in
    chrome://extensions
  2. Check extension port matches config:
    cat ~/.agent-browser-cli/config.json
  3. Ensure at least one normal web page is open (not
    chrome://
    or
    about:
    )
  4. Restart daemon:
    agent-browser-cli restart
  1. 验证扩展程序已在
    chrome://extensions
    页面加载
  2. 检查扩展程序端口与配置是否匹配:
    cat ~/.agent-browser-cli/config.json
  3. 确保至少打开一个普通网页(非
    chrome://
    about:
    页面)
  4. 重启守护进程:
    agent-browser-cli restart

Command timeout

命令超时

  1. Check daemon status:
    agent-browser-cli status
  2. Verify Chrome is running and extension is active
  3. Try simpler command first:
    agent-browser-cli tabs
  4. Check logs:
    tail -f ~/.agent-browser-cli.log
  1. 检查守护进程状态:
    agent-browser-cli status
  2. 验证Chrome浏览器正在运行且扩展程序处于激活状态
  3. 先尝试执行简单命令:
    agent-browser-cli tabs
  4. 查看日志:
    tail -f ~/.agent-browser-cli.log

Port conflicts

端口冲突

bash
undefined
bash
undefined

Change extension port

修改扩展程序端口

agent-browser-cli set-extension-port 18766
agent-browser-cli set-extension-port 18766

Update extension popup to match new port

更新扩展程序弹窗以匹配新端口

Restart daemon

重启守护进程

agent-browser-cli restart
undefined
agent-browser-cli restart
undefined

WSL 2 connectivity issues

WSL 2连接问题

For WSL 2 users on Windows 11 22H2+:
  1. Enable
    networkingMode=mirrored
    in
    .wslconfig
  2. Restart WSL
  3. Ensure Chrome extension can connect to
    localhost:18765
针对Windows 11 22H2+版本的WSL 2用户:
  1. .wslconfig
    文件中启用
    networkingMode=mirrored
  2. 重启WSL
  3. 确保Chrome扩展程序可以连接到
    localhost:18765

Slow performance

性能缓慢

  • Use
    --text-only
    flag for faster content extraction
  • Avoid
    --screenshot
    unless needed
  • Use
    --monitor
    only when detecting page changes is necessary
  • Consider increasing system resources if Chrome is sluggish
  • 使用
    --text-only
    标志以加快内容提取速度
  • 除非必要,否则避免使用
    --screenshot
  • 仅在需要检测页面变更时使用
    --monitor
  • 如果Chrome运行卡顿,可考虑增加系统资源

Advanced: CDP Operations

进阶功能:CDP操作

The underlying Chrome extension uses Chrome DevTools Protocol (CDP). While most operations are abstracted by the CLI, you can execute raw CDP commands via JavaScript:
bash
agent-browser-cli exec '
return await chrome.debugger.sendCommand({tabId: chrome.tabs.getCurrent().id}, "Network.getCookies", {});
'
底层Chrome扩展程序使用Chrome DevTools Protocol (CDP)。虽然大多数操作已通过CLI封装,但你可以通过JavaScript执行原始CDP命令:
bash
agent-browser-cli exec '
return await chrome.debugger.sendCommand({tabId: chrome.tabs.getCurrent().id}, "Network.getCookies", {});
'

Environment Variables

环境变量

The CLI uses a config file instead of environment variables:
Config location:
~/.agent-browser-cli/config.json
json
{
  "extension_port": 18765
}
CLI工具使用配置文件而非环境变量:
配置文件路径
~/.agent-browser-cli/config.json
json
{
  "extension_port": 18765
}

Logs

日志

Daemon logs are written to:
~/.agent-browser-cli.log
Monitor in real-time:
bash
tail -f ~/.agent-browser-cli.log
守护进程日志写入以下路径:
~/.agent-browser-cli.log
实时监控日志:
bash
tail -f ~/.agent-browser-cli.log

Integration with AI Agents

与AI Agent集成

When called by AI coding agents, typical workflows:
  1. Check tab state:
    agent-browser-cli tabs
  2. Navigate or switch:
    agent-browser-cli open <url>
    or
    agent-browser-cli switch <index>
  3. Extract content:
    agent-browser-cli scan --text-only
  4. Perform action:
    agent-browser-cli exec --monitor '<js code>'
  5. Verify result:
    agent-browser-cli scan
    or
    agent-browser-cli screenshot
The tool is designed for high-frequency calls with minimal overhead (~40-120ms for simple operations).
当被AI编码Agent调用时,典型工作流如下:
  1. 检查标签页状态
    agent-browser-cli tabs
  2. 导航或切换标签页
    agent-browser-cli open <url>
    agent-browser-cli switch <index>
  3. 提取内容
    agent-browser-cli scan --text-only
  4. 执行操作
    agent-browser-cli exec --monitor '<js代码>'
  5. 验证结果
    agent-browser-cli scan
    agent-browser-cli screenshot
本工具专为高频调用设计,开销极低(简单操作约40-120毫秒)。