open-browser-use

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Open Browser Use

Open Browser Use

Overview

概述

Open Browser Use connects an MV3 Chrome extension, a local native messaging host, a CLI, SDKs, and an optional stdio MCP server so agents can automate a real Chrome profile. It is not Codex.app-specific; adapt the commands, MCP config, and SDK examples to the agent runtime you are operating in.
Open Browser Use 连接MV3 Chrome扩展、本地原生消息主机、CLI、SDK以及可选的stdio MCP服务器,让Agent能够自动化操作真实的Chrome配置文件。它并非Codex.app专属;可根据你所使用的Agent运行时环境调整命令、MCP配置和SDK示例。

Core Workflow

核心工作流程

  1. Check setup with
    open-browser-use ping
    or
    obu ping
    . If it fails because setup is missing, read references/installation.md.
  2. Choose a unique browser session id for the current agent task before opening or claiming tabs. Prefer the surrounding runtime's conversation/session id when available; otherwise create a short unique id such as
    obu-<task-slug>-<timestamp>
    . Reuse that same id for every Open Browser Use command in this task.
  3. Name the current browser task group before opening or claiming tabs. Use a short task label followed by
     - OBU
    ; if no better task label is available, use
    Task - OBU
    .
  4. Use the CLI for simple inspection or one-shot actions:
    info
    ,
    tabs
    ,
    user-tabs
    ,
    history
    ,
    open-tab
    ,
    navigate
    ,
    cdp
    , and
    call
    .
  5. If the surrounding agent runtime supports local MCP servers, configure
    obu mcp
    and call the exposed browser tools directly. Read references/sdk-and-protocol.md.
  6. Use the JavaScript, Python, or Go SDK for multi-step workflows, event subscriptions, or when the surrounding agent runtime already runs code. Read references/sdk-and-protocol.md.
  7. Before ending browser work, release or keep session tabs with
    open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '<json-array>'
    , the MCP
    finalize_tabs
    tool, or the SDK
    finalizeTabs
    /
    finalize_tabs
    /
    FinalizeTabs
    method.
  8. If communication fails after setup, read references/troubleshooting.md.
  1. 使用
    open-browser-use ping
    obu ping
    检查设置。如果因缺少设置而失败,请阅读references/installation.md
  2. 在打开或认领标签页之前,为当前Agent任务选择一个唯一的浏览器会话ID。如果有可用的关联运行时对话/会话ID,优先使用该ID;否则创建一个简短的唯一ID,例如
    obu-<task-slug>-<timestamp>
    。在本次任务的所有Open Browser Use命令中重复使用同一个ID。
  3. 在打开或认领标签页之前,为当前浏览器任务组命名。使用简短的任务标签后接
     - OBU
    ;如果没有更合适的任务标签,使用
    Task - OBU
  4. 使用CLI执行简单检查或一次性操作:
    info
    tabs
    user-tabs
    history
    open-tab
    navigate
    cdp
    call
  5. 如果关联的Agent运行时环境支持本地MCP服务器,请配置
    obu mcp
    并直接调用暴露的浏览器工具。阅读references/sdk-and-protocol.md
  6. 对于多步骤工作流、事件订阅或关联Agent运行时环境已在运行代码的场景,使用JavaScript、Python或Go SDK。阅读references/sdk-and-protocol.md
  7. 在结束浏览器操作前,使用
    open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '<json-array>'
    、MCP的
    finalize_tabs
    工具或SDK的
    finalizeTabs
    /
    finalize_tabs
    /
    FinalizeTabs
    方法来释放或保留会话标签页。
  8. 如果设置完成后通信失败,请阅读references/troubleshooting.md

Operating Rules

操作规则

  • Treat the browser as the user's real Chrome profile. Do not inspect cookies, passwords, session stores, or unrelated browser data.
  • Ask the user before installing the extension, opening Chrome for them, enabling extension permissions, uploading local files, reading/writing clipboard data, submitting forms, purchasing, deleting, sending, or making other externally visible changes.
  • Do not assume Codex.app helpers, Node REPL globals, or a bundled plugin UI are available. Use the installed
    open-browser-use
    /
    obu
    CLI or the published SDKs.
  • Do not guess tab ids. List tabs first, then use ids returned by
    tabs
    ,
    user-tabs
    ,
    open-tab
    , or SDK calls.
  • Prefer
    claim-tab
    /
    claimUserTab
    for existing user tabs. Claiming should be based on the current
    user-tabs
    result and visible evidence such as URL, title, recency, or group.
  • Use
    --socket
    only when the user or runtime provides an explicit socket. Otherwise let the CLI and SDKs discover the active socket registry.
  • Do not rely on the CLI fallback session
    obu-cli
    for agent tasks. Always pass a task-unique
    --session-id
    to CLI and MCP commands, or set
    sessionId
    /
    session_id
    /
    SessionID
    in SDK clients. The fallback exists for quick manual use and can reuse stale task groups across unrelated agent sessions.
  • Direct CLI subcommands and
    open-browser-use run
    can share the same browser session only when they use the same explicit
    --session-id
    . Finalize that same session before ending browser work.
  • Use
    call --method <method> --params '<json>'
    only when no safer convenience command or SDK wrapper exists.
  • 将浏览器视为用户的真实Chrome配置文件。不要检查Cookie、密码、会话存储或无关的浏览器数据。
  • 在安装扩展、为用户打开Chrome、启用扩展权限、上传本地文件、读写剪贴板数据、提交表单、购买、删除、发送或进行其他外部可见的更改之前,需征得用户同意。
  • 不要假设Codex.app助手、Node REPL全局变量或捆绑的插件UI可用。使用已安装的
    open-browser-use
    /
    obu
    CLI或已发布的SDK。
  • 不要猜测标签页ID。先列出标签页,然后使用
    tabs
    user-tabs
    open-tab
    或SDK调用返回的ID。
  • 对于现有用户标签页,优先使用
    claim-tab
    /
    claimUserTab
    。认领操作应基于当前
    user-tabs
    的结果以及可见证据,如URL、标题、最近使用情况或分组。
  • 仅当用户或运行时环境提供明确的套接字时,才使用
    --socket
    。否则让CLI和SDK自动发现活动套接字注册表。
  • 不要依赖CLI回退会话
    obu-cli
    执行Agent任务。始终为CLI和MCP命令传递任务唯一的
    --session-id
    ,或在SDK客户端中设置
    sessionId
    /
    session_id
    /
    SessionID
    。回退会话仅用于快速手动操作,可能会在不相关的Agent会话之间复用陈旧的任务组。
  • 直接CLI子命令和
    open-browser-use run
    只有在使用相同的显式
    --session-id
    时才能共享同一个浏览器会话。在结束浏览器操作前完成该会话的收尾工作。
  • 仅当没有更安全的便捷命令或SDK包装器时,才使用
    call --method <method> --params '<json>'

Common CLI Actions

常见CLI操作

sh
export OBU_SESSION_ID="obu-docs-scan-$(date +%Y%m%d%H%M%S)"
open-browser-use ping --session-id "$OBU_SESSION_ID"
open-browser-use info --session-id "$OBU_SESSION_ID"
open-browser-use name-session --session-id "$OBU_SESSION_ID" --name "Task - OBU"
open-browser-use tabs --session-id "$OBU_SESSION_ID"
open-browser-use user-tabs --session-id "$OBU_SESSION_ID"
open-browser-use history --session-id "$OBU_SESSION_ID" --query "example" --limit 20
open-browser-use open-tab --session-id "$OBU_SESSION_ID" --url https://example.com
open-browser-use navigate --session-id "$OBU_SESSION_ID" --tab-id <tab-id> --url https://example.com
open-browser-use cdp --session-id "$OBU_SESSION_ID" --tab-id <tab-id> --method Runtime.evaluate --params '{"expression":"document.title"}'
open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '[]'
For CLI-level orchestration without writing SDK code, use a line-oriented action plan:
sh
open-browser-use run --session-id "$OBU_SESSION_ID" -c '
name-session "Docs scan - OBU"
open-tab https://docs.browser-use.com
wait-load domcontentloaded
page-info
finalize-tabs []
'
Each action line shares one session/turn.
open-tab
and
claim-tab
set the default tab for later tab-scoped actions such as
wait-load
,
page-info
,
navigate
,
cdp
,
move-mouse
, and
wait-file-chooser
.
Use
obu
as the short alias when available.
sh
export OBU_SESSION_ID="obu-docs-scan-$(date +%Y%m%d%H%M%S)"
open-browser-use ping --session-id "$OBU_SESSION_ID"
open-browser-use info --session-id "$OBU_SESSION_ID"
open-browser-use name-session --session-id "$OBU_SESSION_ID" --name "Task - OBU"
open-browser-use tabs --session-id "$OBU_SESSION_ID"
open-browser-use user-tabs --session-id "$OBU_SESSION_ID"
open-browser-use history --session-id "$OBU_SESSION_ID" --query "example" --limit 20
open-browser-use open-tab --session-id "$OBU_SESSION_ID" --url https://example.com
open-browser-use navigate --session-id "$OBU_SESSION_ID" --tab-id <tab-id> --url https://example.com
open-browser-use cdp --session-id "$OBU_SESSION_ID" --tab-id <tab-id> --method Runtime.evaluate --params '{"expression":"document.title"}'
open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '[]'
如果无需编写SDK代码,可使用面向行的行动计划进行CLI级编排:
sh
open-browser-use run --session-id "$OBU_SESSION_ID" -c '
name-session "Docs scan - OBU"
open-tab https://docs.browser-use.com
wait-load domcontentloaded
page-info
finalize-tabs []
'
每个操作行共享一个会话/轮次。
open-tab
claim-tab
会为后续的标签页范围操作(如
wait-load
page-info
navigate
cdp
move-mouse
wait-file-chooser
)设置默认标签页。
如果可用,使用
obu
作为短别名。

MCP Usage

MCP使用方法

For runtimes that can launch local MCP servers over stdio, use:
toml
[mcp_servers.open_browser_use]
command = "obu"
args = ["mcp", "--session-id", "obu-<task-or-conversation-id>"]
Use a fresh
--session-id
value per agent task or conversation. If the runtime has a stable conversation/session id, derive the MCP
--session-id
from it.
The MCP server exposes tools including
user_tabs
,
open_tab
,
claim_tab
,
navigate
,
wait_load
,
page_info
,
cdp
,
history
,
run_action_plan
,
finalize_tabs
, and unrestricted
call
.
对于可通过stdio启动本地MCP服务器的运行时环境,使用:
toml
[mcp_servers.open_browser_use]
command = "obu"
args = ["mcp", "--session-id", "obu-<task-or-conversation-id>"]
为每个Agent任务或对话使用全新的
--session-id
值。如果运行时环境有稳定的对话/会话ID,从中派生MCP的
--session-id
MCP服务器暴露的工具包括
user_tabs
open_tab
claim_tab
navigate
wait_load
page_info
cdp
history
run_action_plan
finalize_tabs
以及无限制的
call

Tab Lifecycle

标签页生命周期

  • Session tabs are tabs Open Browser Use has created or claimed for the current agent workflow.
  • Use one unique session id per agent task or conversation. Do not share the fallback
    obu-cli
    session across unrelated tasks.
  • Task session groups should be named from the task, using the pattern
    <short task> - OBU
    . Use
    Task - OBU
    as the fallback name.
  • Keep no tabs by default:
    open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '[]'
    .
  • Keep a tab only when the user needs that live page after the turn. Omit research, source, search, intermediate, duplicate, blank, error, and login/navigation tabs after extracting what you need.
  • Keep a tab with
    status: "deliverable"
    when the tab itself is the user-facing output or requested open page, such as a created or edited document, dashboard, checkout/cart, submitted form result, or a page the user explicitly asked to inspect directly.
  • Keep a tab with
    status: "handoff"
    only when the task is still in progress and the user or a later turn should continue from the current task group, such as a page waiting for user input, login, approval, payment, CAPTCHA, or an unfinished workflow.
  • Handoff tabs stay in the task session group. Deliverable tabs move to the shared
    ✅ Open Browser Use
    tab group.
  • Run finalization as the last Open Browser Use browser action for the turn. Do not call Open Browser Use browser tools after finalizing; if more browser work is needed, do it first and finalize once with the final tab disposition.
  • 会话标签页是Open Browser Use为当前Agent工作流创建或认领的标签页。
  • 为每个Agent任务或对话使用一个唯一的会话ID。不要在不相关的任务之间共享回退会话
    obu-cli
  • 任务会话组应根据任务命名,采用
    <简短任务名> - OBU
    的格式。使用
    Task - OBU
    作为默认名称。
  • 默认不保留任何标签页:
    open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '[]'
  • 仅当用户在轮次结束后需要该活动页面时才保留标签页。在提取所需内容后,移除研究、来源、搜索、中间、重复、空白、错误以及登录/导航标签页。
  • 当标签页本身是面向用户的输出或请求打开的页面(如创建或编辑的文档、仪表板、结账/购物车、提交表单的结果,或用户明确要求直接查看的页面)时,将其标记为
    status: "deliverable"
    并保留。
  • 仅当任务仍在进行中且用户或后续轮次需要从当前任务组继续操作时(如等待用户输入、登录、审批、支付、验证码或未完成的工作流的页面),将标签页标记为
    status: "handoff"
    并保留。
  • 交接标签页(handoff)留在任务会话组中。交付标签页(deliverable)移至共享的
    ✅ Open Browser Use
    标签页组。
  • 将收尾操作作为本轮次的最后一个Open Browser Use浏览器操作。收尾后不要调用Open Browser Use浏览器工具;如果需要更多浏览器操作,请先执行这些操作,然后一次性完成收尾并设置最终的标签页处置方式。

File Choosers, Downloads, And Clipboard

文件选择器、下载和剪贴板

  • File uploads use the intercepted file chooser flow: start waiting, trigger the chooser in the page, then set absolute local paths with
    set-file-chooser-files
    or the SDK equivalent.
  • Downloads can be observed with SDK notification handlers or Browser Use methods such as
    waitForDownload
    and
    downloadPath
    .
  • Clipboard helpers operate through the current controlled tab and should be treated as sensitive user actions.
  • 文件上传使用拦截的文件选择器流程:开始等待,在页面中触发选择器,然后使用
    set-file-chooser-files
    或等效的SDK方法设置本地绝对路径。
  • 可通过SDK通知处理程序或Browser Use方法(如
    waitForDownload
    downloadPath
    )监控下载。
  • 剪贴板助手通过当前受控标签页运行,应视为敏感的用户操作。

References

参考资料

  • references/installation.md: one-time CLI and browser extension setup, including cases where user cooperation is required.
  • references/sdk-and-protocol.md: JavaScript, Python, Go, socket, and JSON-RPC usage details.
  • references/troubleshooting.md: connection failures, stale sockets, extension/native host checks, and permission issues.
  • references/installation.md:一次性CLI和浏览器扩展设置,包括需要用户配合的场景。
  • references/sdk-and-protocol.md:JavaScript、Python、Go、套接字和JSON-RPC的使用细节。
  • references/troubleshooting.md:连接失败、陈旧套接字、扩展/原生主机检查以及权限问题。