qiaomu-opencli-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenCLI Browser — Browser Automation for AI Agents
OpenCLI Browser — 面向AI Agent的浏览器自动化工具
Control Chrome step-by-step via CLI. Reuses existing login sessions — no passwords needed.
通过CLI逐步控制Chrome。复用现有登录会话——无需密码。
Prerequisites
前置条件
bash
opencli doctor # Verify extension + daemon connectivityRequires: Chrome running + OpenCLI Browser Bridge extension installed.
bash
opencli doctor # 验证扩展程序与守护进程的连接状态要求:Chrome已运行 + 已安装OpenCLI Browser Bridge扩展程序。
Critical Rules
核心规则
- ALWAYS use to inspect the page, NEVER use
state—screenshotreturns structured DOM withstateelement indices, is instant and costs zero tokens.[N]requires vision processing and is slow. Only usescreenshotwhen the user explicitly asks to save a visual.screenshot - ALWAYS use /
click/typefor interaction, NEVER useselectto click or type —evalbypasses scrollIntoView and CDP click pipeline, causing failures on off-screen elements. Useeval "el.click()"to find thestateindex, then[N].click <N> - Verify inputs with , not screenshots — after
get value, runtypeto confirm.get value <index> - Run after every page change — after
state,open(on links),click, always runscrollto see the new elements and their indices. Never guess indices.state - Chain commands aggressively with — combine
&&, multipleopen + statecalls, andtypeinto singletype + get valuechains. Each tool call has overhead; chaining cuts it.&& - is read-only — use
evalONLY for data extraction (eval), never for clicking, typing, or navigating. Always wrap in IIFE to avoid variable conflicts:JSON.stringify(...).eval "(function(){ const x = ...; return JSON.stringify(x); })()" - Minimize total tool calls — plan your sequence before acting. A good task completion uses 3-5 tool calls, not 15-20. Combine as one call. Combine
open + stateas one call. Only runtype + type + clickseparately when you need to discover new indices.state - Prefer to discover APIs — most sites have JSON APIs. API-based adapters are more reliable than DOM scraping.
network
- 始终使用检查页面,绝不使用
state——screenshot返回带有state元素索引的结构化DOM,响应即时且无需消耗任何令牌。[N]需要视觉处理,速度较慢。仅当用户明确要求保存可视化内容时才使用screenshot。screenshot - 始终使用/
click/type进行交互,绝不使用select来点击或输入 ——eval会绕过scrollIntoView和CDP点击流程,导致屏幕外元素操作失败。使用eval "el.click()"查找state索引,然后执行[N]。click <N> - 使用验证输入,而非截图 —— 执行
get value后,运行type确认输入内容。get value <index> - 每次页面变更后都要运行—— 在
state、open(点击链接时)、click之后,务必运行scroll查看新元素及其索引。切勿猜测索引。state - 用积极串联命令 —— 将
&&、多次open + state调用、type合并为单个type + get value串联的命令。每次工具调用都会产生开销,串联可减少该开销。&& - 仅用于读取操作 ——
eval仅可用于数据提取(eval),绝不能用于点击、输入或导航。始终用立即执行函数表达式(IIFE)包裹以避免变量冲突:JSON.stringify(...)。eval "(function(){ const x = ...; return JSON.stringify(x); })()" - 尽量减少工具调用总数 —— 操作前规划好流程。完成一项任务的优秀流程仅需3-5次工具调用,而非15-20次。将合并为一次调用。将
open + state合并为一次调用。仅当需要发现新索引时才单独运行type + type + click。state - 优先使用发现API —— 大多数网站都有JSON API。基于API的适配器比DOM爬取更可靠。
network
Command Cost Guide
命令成本指南
| Cost | Commands | When to use |
|---|---|---|
| Free & instant | | Default — use these |
| Free but changes page | | Interaction — run |
| Expensive (vision tokens) | | ONLY when user needs a saved image |
| 成本 | 命令 | 使用场景 |
|---|---|---|
| 免费且即时 | | 默认使用——优先选择这些命令 |
| 免费但会变更页面 | | 交互操作——执行后需运行 |
| 高成本(视觉令牌) | | 仅在用户需要保存图片时使用 |
Action Chaining Rules
命令串联规则
Commands can be chained with . The browser persists via daemon, so chaining is safe.
&&Always chain when possible — fewer tool calls = faster completion:
bash
undefined命令可通过串联。浏览器通过守护进程保持运行状态,因此串联操作是安全的。
&&尽可能串联命令 —— 工具调用次数越少,完成速度越快:
bash
undefinedGOOD: open + inspect in one call (saves 1 round trip)
推荐:一次调用完成打开+检查(减少1次往返)
opencli browser open https://example.com && opencli browser state
opencli browser open https://example.com && opencli browser state
GOOD: fill form in one call (saves 2 round trips)
推荐:一次调用完成表单填写(减少2次往返)
opencli browser type 3 "hello" && opencli browser type 4 "world" && opencli browser click 7
opencli browser type 3 "hello" && opencli browser type 4 "world" && opencli browser click 7
GOOD: type + verify in one call
推荐:一次调用完成输入+验证
opencli browser type 5 "test@example.com" && opencli browser get value 5
opencli browser type 5 "test@example.com" && opencli browser get value 5
GOOD: click + wait + state in one call (for page-changing clicks)
推荐:一次调用完成点击+等待+检查(针对会跳转页面的点击操作)
opencli browser click 12 && opencli browser wait time 1 && opencli browser state
opencli browser click 12 && opencli browser wait time 1 && opencli browser state
BAD: separate calls for each action (wasteful)
不推荐:每个操作单独调用(浪费资源)
opencli browser type 3 "hello" # Don't do this
opencli browser type 4 "world" # when you can chain
opencli browser click 7 # all three together
**Page-changing — always put last** in a chain (subsequent commands see stale indices):
- `open <url>`, `back`, `click <link/button that navigates>`
**Rule**: Chain when you already know the indices. Run `state` separately when you need to discover indices first.opencli browser type 3 "hello" # 不要这样做
opencli browser type 4 "world" # 能串联时就串联
opencli browser click 7 # 将三个操作合并
**页面变更命令——始终放在串联末尾**(后续命令会读取到过期索引):
- `open <url>`、`back`、`click <会跳转的链接/按钮>`
**规则**:已明确索引时使用串联。需要先发现索引时单独运行`state`。Core Workflow
核心工作流程
- Navigate:
opencli browser open <url> - Inspect: → elements with
opencli browser stateindices[N] - Interact: use indices — ,
click,type,selectkeys - Wait (if needed): or
opencli browser wait selector ".loaded"wait text "Success" - Verify: or
opencli browser stateopencli browser get value <N> - Repeat: browser stays open between commands
- Save: write a TS adapter to
~/.opencli/clis/<site>/<command>.ts
- 导航:
opencli browser open <url> - 检查:→ 获取带有
opencli browser state索引的元素[N] - 交互:使用索引执行、
click、type、select操作keys - 等待(如有需要):或
opencli browser wait selector ".loaded"wait text "Success" - 验证:或
opencli browser stateopencli browser get value <N> - 重复:浏览器在命令之间保持打开状态
- 保存:编写TS适配器并保存到
~/.opencli/clis/<site>/<command>.ts
Commands
命令列表
Navigation
导航
bash
opencli browser open <url> # Open URL (page-changing)
opencli browser back # Go back (page-changing)
opencli browser scroll down # Scroll (up/down, --amount N)
opencli browser scroll up --amount 1000bash
opencli browser open <url> # 打开URL(会变更页面)
opencli browser back # 返回上一页(会变更页面)
opencli browser scroll down # 滚动页面(向上/向下,--amount N指定滚动量)
opencli browser scroll up --amount 1000Inspect (free & instant)
检查(免费且即时)
bash
opencli browser state # Structured DOM with [N] indices — PRIMARY tool
opencli browser screenshot [path.png] # Save visual to file — ONLY for user deliverablesbash
opencli browser state # 获取带[N]索引的结构化DOM——核心工具
opencli browser screenshot [path.png] # 将可视化内容保存到文件——仅用于交付给用户的场景Get (free & instant)
获取信息(免费且即时)
bash
opencli browser get title # Page title
opencli browser get url # Current URL
opencli browser get text <index> # Element text content
opencli browser get value <index> # Input/textarea value (use to verify after type)
opencli browser get html # Full page HTML
opencli browser get html --selector "h1" # Scoped HTML
opencli browser get attributes <index> # Element attributesbash
opencli browser get title # 获取页面标题
opencli browser get url # 获取当前URL
opencli browser get text <index> # 获取元素文本内容
opencli browser get value <index> # 获取输入框/文本域的值(用于输入后验证)
opencli browser get html # 获取完整页面HTML
opencli browser get html --selector "h1" # 获取指定选择器范围内的HTML
opencli browser get attributes <index> # 获取元素属性Interact
交互操作
bash
opencli browser click <index> # Click element [N]
opencli browser type <index> "text" # Type into element [N]
opencli browser select <index> "option" # Select dropdown
opencli browser keys "Enter" # Press key (Enter, Escape, Tab, Control+a)bash
opencli browser click <index> # 点击元素[N]
opencli browser type <index> "text" # 向元素[N]输入文本
opencli browser select <index> "option" # 选择下拉选项
opencli browser keys "Enter" # 按键(Enter、Escape、Tab、Control+a等)Wait
等待操作
Three variants — use the right one for the situation:
bash
opencli browser wait time 3 # Wait N seconds (fixed delay)
opencli browser wait selector ".loaded" # Wait until element appears in DOM
opencli browser wait selector ".spinner" --timeout 5000 # With timeout (default 30s)
opencli browser wait text "Success" # Wait until text appears on pageWhen to wait: After on SPAs, after that triggers async loading, before on dynamically rendered content.
openclickeval三种变体——根据场景选择合适的方式:
bash
opencli browser wait time 3 # 等待N秒(固定延迟)
opencli browser wait selector ".loaded" # 等待元素出现在DOM中
opencli browser wait selector ".spinner" --timeout 5000 # 带超时时间的等待(默认30秒)
opencli browser wait text "Success" # 等待文本出现在页面上等待场景:在SPA页面执行后、触发异步加载的后、对动态渲染内容执行前。
openclickevalExtract (free & instant, read-only)
数据提取(免费且即时,仅读取)
Use ONLY for reading data. Never use it to click, type, or navigate.
evalbash
opencli browser eval "document.title"
opencli browser eval "JSON.stringify([...document.querySelectorAll('h2')].map(e => e.textContent))"evalbash
opencli browser eval "document.title"
opencli browser eval "JSON.stringify([...document.querySelectorAll('h2')].map(e => e.textContent))"IMPORTANT: wrap complex logic in IIFE to avoid "already declared" errors
重要提示:将复杂逻辑包裹在IIFE中,避免“已声明”错误
opencli browser eval "(function(){ const items = [...document.querySelectorAll('.item')]; return JSON.stringify(items.map(e => e.textContent)); })()"
**Selector safety**: Always use fallback selectors — `querySelector` returns `null` on miss:
```bashopencli browser eval "(function(){ const items = [...document.querySelectorAll('.item')]; return JSON.stringify(items.map(e => e.textContent)); })()"
**选择器安全**:始终使用备选选择器——`querySelector`在未找到元素时会返回`null`:
```bashBAD: crashes if selector misses
不推荐:未找到选择器时会崩溃
opencli browser eval "document.querySelector('.title').textContent"
opencli browser eval "document.querySelector('.title').textContent"
GOOD: fallback with || or ?.
推荐:使用||或?.设置备选方案
opencli browser eval "(document.querySelector('.title') || document.querySelector('h1') || {textContent:''}).textContent"
opencli browser eval "document.querySelector('.title')?.textContent ?? 'not found'"
undefinedopencli browser eval "(document.querySelector('.title') || document.querySelector('h1') || {textContent:''}).textContent"
opencli browser eval "document.querySelector('.title')?.textContent ?? 'not found'"
undefinedNetwork (API Discovery)
网络监控(API发现)
bash
opencli browser network # Show captured API requests (auto-captured since open)
opencli browser network --detail 3 # Show full response body of request #3
opencli browser network --all # Include static resourcesbash
opencli browser network # 显示捕获的API请求(从打开页面时自动捕获)
opencli browser network --detail 3 # 显示第3个请求的完整响应体
opencli browser network --all # 包含静态资源Sedimentation (Save as CLI)
沉淀为可复用CLI——完整流程
bash
opencli browser init hn/top # Generate adapter scaffold at ~/.opencli/clis/hn/top.ts
opencli browser verify hn/top # Test the adapter (adds --limit 3 only if `limit` arg is defined)- auto-detects the domain from the active browser session (no need to specify it)
init - creates the file + populates
init,site,name, anddomainfrom current pagecolumns - runs the adapter end-to-end and prints output; if no
verifyarg exists in the adapter, it won't passlimit--limit 3
bash
opencli browser init hn/top # 在~/.opencli/clis/hn/top.ts生成适配器脚手架
opencli browser verify hn/top # 测试适配器(仅当适配器定义了`limit`参数时才会传入--limit 3)- 会从活跃浏览器会话中自动检测域名(无需手动指定)
init - 会创建文件并从当前页面填充
init、site、name和domain字段columns - 会端到端运行适配器并打印输出;如果适配器中没有
verify参数,则不会传入limit--limit 3
Session
会话管理
bash
opencli browser close # Close automation windowbash
opencli browser close # 关闭自动化窗口Example: Extract HN Stories
示例:提取Hacker News文章
bash
opencli browser open https://news.ycombinator.com
opencli browser state # See [1] a "Story 1", [2] a "Story 2"...
opencli browser eval "JSON.stringify([...document.querySelectorAll('.titleline a')].slice(0,5).map(a => ({title: a.textContent, url: a.href})))"
opencli browser closebash
opencli browser open https://news.ycombinator.com
opencli browser state # 查看[1]“Story 1”、[2]“Story 2”...
opencli browser eval "JSON.stringify([...document.querySelectorAll('.titleline a')].slice(0,5).map(a => ({title: a.textContent, url: a.href})))"
opencli browser closeExample: Fill a Form
示例:填写表单
bash
opencli browser open https://httpbin.org/forms/post
opencli browser state # See [3] input "Customer Name", [4] input "Telephone"
opencli browser type 3 "OpenCLI" && opencli browser type 4 "555-0100"
opencli browser get value 3 # Verify: "OpenCLI"
opencli browser closebash
opencli browser open https://httpbin.org/forms/post
opencli browser state # 查看[3]输入框“Customer Name”、[4]输入框“Telephone”
opencli browser type 3 "OpenCLI" && opencli browser type 4 "555-0100"
opencli browser get value 3 # 验证:返回“OpenCLI”
opencli browser closeSaving as Reusable CLI — Complete Workflow
保存为可复用CLI——完整工作流程
Step-by-step sedimentation flow:
分步沉淀流程:
bash
undefinedbash
undefined1. Explore the website
1. 探索网站
opencli browser open https://news.ycombinator.com
opencli browser state # Understand DOM structure
opencli browser open https://news.ycombinator.com
opencli browser state # 了解DOM结构
2. Discover APIs (crucial for high-quality adapters)
2. 发现API(对高质量适配器至关重要)
opencli browser eval "fetch('/api/...').then(r=>r.json())" # Trigger API calls
opencli browser network # See captured API requests
opencli browser network --detail 0 # Inspect response body
opencli browser eval "fetch('/api/...').then(r=>r.json())" # 触发API调用
opencli browser network # 查看捕获的API请求
opencli browser network --detail 0 # 检查响应体
3. Generate scaffold
3. 生成脚手架
opencli browser init hn/top # Creates ~/.opencli/clis/hn/top.ts
opencli browser init hn/top # 创建~/.opencli/clis/hn/top.ts
4. Edit the adapter (fill in func logic)
4. 编辑适配器(填充函数逻辑)
- If API found: use fetch() directly (Strategy.PUBLIC or COOKIE)
- 如果找到API:直接使用fetch()(Strategy.PUBLIC或COOKIE策略)
- If no API: use page.evaluate() for DOM extraction (Strategy.UI)
- 如果没有API:使用page.evaluate()进行DOM提取(Strategy.UI策略)
5. Verify
5. 验证
opencli browser verify hn/top # Runs the adapter and shows output
opencli browser verify hn/top # 运行适配器并显示输出
6. If verify fails, edit and retry
6. 如果验证失败,编辑后重试
7. Close when done
7. 完成后关闭
opencli browser close
undefinedopencli browser close
undefinedExample adapter:
适配器示例:
typescript
// ~/.opencli/clis/hn/top.ts
import { cli, Strategy } from '@jackwener/opencli/registry';
cli({
site: 'hn',
name: 'top',
description: 'Top Hacker News stories',
domain: 'news.ycombinator.com',
strategy: Strategy.PUBLIC,
browser: false,
args: [{ name: 'limit', type: 'int', default: 5 }],
columns: ['rank', 'title', 'score', 'url'],
func: async (_page, kwargs) => {
const limit = Math.min(Math.max(1, kwargs.limit ?? 5), 50);
const resp = await fetch('https://hacker-news.firebaseio.com/v0/topstories.json');
const ids = await resp.json();
return Promise.all(
ids.slice(0, limit).map(async (id: number, i: number) => {
const item = await (await fetch(`https://hacker-news.firebaseio.com/v0/item/${id}.json`)).json();
return { rank: i + 1, title: item.title, score: item.score, url: item.url ?? '' };
})
);
},
});Save to → immediately available as .
~/.opencli/clis/<site>/<command>.tsopencli <site> <command>typescript
// ~/.opencli/clis/hn/top.ts
import { cli, Strategy } from '@jackwener/opencli/registry';
cli({
site: 'hn',
name: 'top',
description: 'Top Hacker News stories',
domain: 'news.ycombinator.com',
strategy: Strategy.PUBLIC,
browser: false,
args: [{ name: 'limit', type: 'int', default: 5 }],
columns: ['rank', 'title', 'score', 'url'],
func: async (_page, kwargs) => {
const limit = Math.min(Math.max(1, kwargs.limit ?? 5), 50);
const resp = await fetch('https://hacker-news.firebaseio.com/v0/topstories.json');
const ids = await resp.json();
return Promise.all(
ids.slice(0, limit).map(async (id: number, i: number) => {
const item = await (await fetch(`https://hacker-news.firebaseio.com/v0/item/${id}.json`)).json();
return { rank: i + 1, title: item.title, score: item.score, url: item.url ?? '' };
})
);
},
});保存到 → 即可直接通过调用。
~/.opencli/clis/<site>/<command>.tsopencli <site> <command>Strategy Guide
策略指南
| Strategy | When | browser: |
|---|---|---|
| Public API, no auth | |
| Needs login cookies | |
| Direct DOM interaction | |
Always prefer API over UI — if you discovered an API during browsing, use directly.
fetch()| 策略 | 使用场景 | browser配置: |
|---|---|---|
| 公开API,无需认证 | |
| 需要登录Cookie | |
| 直接DOM交互 | |
优先选择API而非UI —— 如果在浏览过程中发现API,直接使用。
fetch()Tips
技巧
- Always first — never guess element indices, always inspect first
state - Sessions persist — browser stays open between commands, no need to re-open
- Use for data extraction —
evalis faster than multipleeval "JSON.stringify(...)"callsget - Use to find APIs — JSON APIs are more reliable than DOM scraping
network - Alias: is shorthand for
opencli opopencli browser
- 始终先运行—— 绝不猜测元素索引,始终先检查
state - 会话保持持久 —— 浏览器在命令之间保持打开状态,无需重新打开
- 使用进行数据提取 ——
eval比多次eval "JSON.stringify(...)"调用更快get - 使用查找API —— JSON API比DOM爬取更可靠
network - 别名:是
opencli op的简写opencli browser
Common Pitfalls
常见陷阱
-
fails in automation — Don't use
form.submit()orform.submit()to submit forms. Navigate directly to the search URL instead:evalbash# BAD: form.submit() often silently fails opencli browser eval "document.querySelector('form').submit()" # GOOD: construct the URL and navigate opencli browser open "https://github.com/search?q=opencli&type=repositories" -
GitHub DOM changes frequently — Preferattributes when available; they are more stable than class names or tag structure.
data-testid -
SPA pages needbefore extraction — After
waitoropenon single-page apps, the DOM isn't ready immediately. Alwaysclickorwait selectorbeforewait text.eval -
Usebefore clicking — Run
stateto inspect available interactive elements and their indices. Never guess indices from memory.opencli browser state -
runs in browser context —
evaluatein adapters executes inside the browser. Node.js APIs (page.evaluate(),fs,path) are NOT available. Useprocessfor network calls, DOM APIs for page data.fetch() -
Backticks inbreak JSON storage — When writing adapters that will be stored/transported as JSON, avoid template literals inside
page.evaluate. Use string concatenation or function-style evaluate:page.evaluatetypescript// BAD: template literal backticks break when adapter is in JSON page.evaluate(`document.querySelector("${selector}")`) // GOOD: function-style evaluate page.evaluate((sel) => document.querySelector(sel), selector)
-
在自动化中易失败 —— 不要使用
form.submit()或form.submit()提交表单。直接导航到搜索URL:evalbash# 不推荐:form.submit()经常静默失败 opencli browser eval "document.querySelector('form').submit()" # 推荐:构造URL并导航 opencli browser open "https://github.com/search?q=opencli&type=repositories" -
GitHub的DOM频繁变更 —— 优先使用属性;它们比类名或标签结构更稳定。
data-testid -
SPA页面提取前需要等待 —— 在SPA页面执行或
open后,DOM不会立即就绪。执行click前务必使用eval或wait selector等待。wait text -
点击前先运行—— 运行
state检查可用交互元素及其索引。绝不凭记忆猜测索引。opencli browser state -
在浏览器上下文运行 —— 适配器中的
evaluate在浏览器内部执行。Node.js API(page.evaluate()、fs、path)不可用。使用process进行网络调用,使用DOM API获取页面数据。fetch() -
中的反引号会破坏JSON存储 —— 编写将以JSON格式存储/传输的适配器时,避免在
page.evaluate中使用模板字符串。使用字符串拼接或函数式evaluate:page.evaluatetypescript// 不推荐:模板字符串的反引号在JSON适配器中会失效 page.evaluate(`document.querySelector("${selector}")`) // 推荐:函数式evaluate page.evaluate((sel) => document.querySelector(sel), selector)
Troubleshooting
故障排除
| Error | Fix |
|---|---|
| "Browser not connected" | Run |
| "attach failed: chrome-extension://" | Disable 1Password temporarily |
| Element not found | |
| Stale indices after page change | Run |
| 错误 | 修复方案 |
|---|---|
| "Browser not connected" | 运行 |
| "attach failed: chrome-extension://" | 临时禁用1Password |
| 元素未找到 | |
| 页面变更后索引过期 | 重新运行 |