launch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

VS Code Automation

VS Code 自动化

Automate VS Code (Code OSS) using agent-browser. VS Code is built on Electron/Chromium and exposes a Chrome DevTools Protocol (CDP) port that agent-browser can connect to, enabling the same snapshot-interact workflow used for web pages.
使用agent-browser实现VS Code(Code OSS)的自动化。VS Code基于Electron/Chromium构建,暴露了一个Chrome DevTools Protocol(CDP)端口,agent-browser可连接该端口,从而实现与网页相同的快照-交互工作流。

Prerequisites

前置条件

  • agent-browser
    must be installed.
    It's listed in devDependencies — run
    npm install
    in the repo root. Use
    npx agent-browser
    if it's not on your PATH, or install globally with
    npm install -g agent-browser
    .
  • For Code OSS (VS Code dev build): The repo must be built before launching.
    ./scripts/code.sh
    runs the build automatically if needed, or set
    VSCODE_SKIP_PRELAUNCH=1
    to skip the compile step if you've already built.
  • CSS selectors are internal implementation details. Selectors like
    .interactive-input-part
    ,
    .interactive-input-editor
    , and
    .part.auxiliarybar
    used in
    eval
    commands are VS Code internals that may change across versions. If they stop working, use
    agent-browser snapshot -i
    to re-discover the current DOM structure.
  • 必须安装
    agent-browser
    。它已列在devDependencies中——在仓库根目录运行
    npm install
    即可。如果它不在你的PATH中,可使用
    npx agent-browser
    ,或通过
    npm install -g agent-browser
    全局安装。
  • 针对Code OSS(VS Code开发版):启动前必须先构建仓库。
    ./scripts/code.sh
    会在需要时自动运行构建,若你已完成构建,可设置
    VSCODE_SKIP_PRELAUNCH=1
    跳过编译步骤。
  • CSS选择器属于内部实现细节
    eval
    命令中使用的
    .interactive-input-part
    .interactive-input-editor
    .part.auxiliarybar
    等选择器是VS Code的内部实现,可能会随版本变化。如果这些选择器失效,可使用
    agent-browser snapshot -i
    重新探索当前的DOM结构。

Core Workflow

核心工作流

  1. Launch Code OSS with remote debugging enabled
  2. Connect agent-browser to the CDP port
  3. Snapshot to discover interactive elements
  4. Interact using element refs
  5. Re-snapshot after navigation or state changes
bash
undefined
  1. 启动启用远程调试的Code OSS
  2. 连接agent-browser到CDP端口
  3. 快照以发现可交互元素
  4. 交互使用元素引用
  5. 重新快照在导航或状态变更后
bash
undefined

Launch Code OSS with remote debugging

Launch Code OSS with remote debugging

./scripts/code.sh --remote-debugging-port=9224
./scripts/code.sh --remote-debugging-port=9224

Wait for Code OSS to start, retry until connected

Wait for Code OSS to start, retry until connected

for i in 1 2 3 4 5; do agent-browser connect 9224 2>/dev/null && break || sleep 3; done
for i in 1 2 3 4 5; do agent-browser connect 9224 2>/dev/null && break || sleep 3; done

Discover UI elements

Discover UI elements

agent-browser snapshot -i
agent-browser snapshot -i

Focus the chat input (macOS)

Focus the chat input (macOS)

agent-browser press Control+Meta+i
undefined
agent-browser press Control+Meta+i
undefined

Connecting

连接操作

bash
undefined
bash
undefined

Connect to a specific port

Connect to a specific port

agent-browser connect 9222
agent-browser connect 9222

Or use --cdp on each command

Or use --cdp on each command

agent-browser --cdp 9222 snapshot -i
agent-browser --cdp 9222 snapshot -i

Auto-discover a running Chromium-based app

Auto-discover a running Chromium-based app

agent-browser --auto-connect snapshot -i

After `connect`, all subsequent commands target the connected app without needing `--cdp`.
agent-browser --auto-connect snapshot -i

执行`connect`后,所有后续命令都会指向已连接的应用,无需再使用`--cdp`参数。

Tab Management

标签页管理

Electron apps often have multiple windows or webviews. Use tab commands to list and switch between them:
bash
undefined
Electron应用通常有多个窗口或webview。使用标签页命令来列出并切换它们:
bash
undefined

List all available targets (windows, webviews, etc.)

List all available targets (windows, webviews, etc.)

agent-browser tab
agent-browser tab

Switch to a specific tab by index

Switch to a specific tab by index

agent-browser tab 2
agent-browser tab 2

Switch by URL pattern

Switch by URL pattern

agent-browser tab --url "settings"
undefined
agent-browser tab --url "settings"
undefined

Launching Code OSS (VS Code Dev Build)

启动Code OSS(VS Code开发版)

The VS Code repository includes
scripts/code.sh
which launches Code OSS from source. It passes all arguments through to the Electron binary, so
--remote-debugging-port
works directly:
bash
cd <repo-root>  # the root of your VS Code checkout
./scripts/code.sh --remote-debugging-port=9224
Wait for the window to fully initialize, then connect:
bash
undefined
VS Code仓库包含
scripts/code.sh
脚本,可从源码启动Code OSS。它会将所有参数传递给Electron二进制文件,因此
--remote-debugging-port
参数可直接生效:
bash
cd <repo-root>  # the root of your VS Code checkout
./scripts/code.sh --remote-debugging-port=9224
等待窗口完全初始化后再进行连接:
bash
undefined

Wait for Code OSS to start, retry until connected

Wait for Code OSS to start, retry until connected

for i in 1 2 3 4 5; do agent-browser connect 9224 2>/dev/null && break || sleep 3; done agent-browser snapshot -i

**Tips:**
- Set `VSCODE_SKIP_PRELAUNCH=1` to skip the compile step if you've already built: `VSCODE_SKIP_PRELAUNCH=1 ./scripts/code.sh --remote-debugging-port=9224` (from the repo root)
- Code OSS uses the default user data directory. Unlike VS Code Insiders, you don't typically need `--user-data-dir` since there's usually only one Code OSS instance running.
- If you see "Sent env to running instance. Terminating..." it means Code OSS is already running and forwarded your args to the existing instance. Quit Code OSS and relaunch with the flag, or use `--user-data-dir=/tmp/code-oss-debug` to force a new instance.
for i in 1 2 3 4 5; do agent-browser connect 9224 2>/dev/null && break || sleep 3; done agent-browser snapshot -i

**提示:**
- 若你已完成构建,可设置`VSCODE_SKIP_PRELAUNCH=1`跳过编译步骤:`VSCODE_SKIP_PRELAUNCH=1 ./scripts/code.sh --remote-debugging-port=9224`(需在仓库根目录执行)
- Code OSS使用默认的用户数据目录。与VS Code Insiders不同,你通常不需要`--user-data-dir`参数,因为通常只会运行一个Code OSS实例。
- 若你看到“Sent env to running instance. Terminating...”提示,说明Code OSS已在运行,它会将你的参数转发给现有实例。请退出Code OSS并重新带参数启动,或使用`--user-data-dir=/tmp/code-oss-debug`强制启动新实例。

Launching VS Code Extensions for Debugging

启动VS Code扩展以进行调试

To debug a VS Code extension via agent-browser, launch VS Code Insiders with
--extensionDevelopmentPath
and
--remote-debugging-port
. Use
--user-data-dir
to avoid conflicting with an already-running instance.
bash
undefined
要通过agent-browser调试VS Code扩展,需使用
--extensionDevelopmentPath
--remote-debugging-port
参数启动VS Code Insiders。使用
--user-data-dir
参数可避免与已运行的实例冲突。
bash
undefined

Build the extension first

Build the extension first

cd <extension-repo-root> # e.g., the root of your extension checkout npm run compile
cd <extension-repo-root> # e.g., the root of your extension checkout npm run compile

Launch VS Code Insiders with the extension and CDP

Launch VS Code Insiders with the extension and CDP

code-insiders
--extensionDevelopmentPath="<extension-repo-root>"
--remote-debugging-port=9223
--user-data-dir=/tmp/vscode-ext-debug
code-insiders
--extensionDevelopmentPath="<extension-repo-root>"
--remote-debugging-port=9223
--user-data-dir=/tmp/vscode-ext-debug

Wait for VS Code to start, retry until connected

Wait for VS Code to start, retry until connected

for i in 1 2 3 4 5; do agent-browser connect 9223 2>/dev/null && break || sleep 3; done agent-browser snapshot -i

**Key flags:**
- `--extensionDevelopmentPath=<path>` — loads your extension from source (must be compiled first)
- `--remote-debugging-port=9223` — enables CDP (use 9223 to avoid conflicts with other apps on 9222)
- `--user-data-dir=<path>` — uses a separate profile so it starts a new process instead of sending to an existing VS Code instance

**Without `--user-data-dir`**, VS Code detects the running instance, forwards the args to it, and exits immediately — you'll see "Sent env to running instance. Terminating..." and CDP never starts.
for i in 1 2 3 4 5; do agent-browser connect 9223 2>/dev/null && break || sleep 3; done agent-browser snapshot -i

**关键参数:**
- `--extensionDevelopmentPath=<path>` — 从源码加载你的扩展(必须先完成编译)
- `--remote-debugging-port=9223` — 启用CDP(使用9223可避免与其他应用在9222端口冲突)
- `--user-data-dir=<path>` — 使用独立的配置文件,这样会启动新进程而非将参数发送到已运行的VS Code实例

**若不使用`--user-data-dir`**,VS Code会检测到运行中的实例,将参数转发给它后立即退出——你会看到“Sent env to running instance. Terminating...”提示,且CDP永远不会启动。

Interacting with Monaco Editor (Chat Input, Code Editors)

与Monaco Editor交互(聊天输入、代码编辑器)

VS Code uses Monaco Editor for all text inputs including the Copilot Chat input. Monaco editors require specific agent-browser techniques — standard
click
,
fill
, and
keyboard type
commands may not work depending on the VS Code build.
VS Code使用Monaco Editor处理所有文本输入,包括Copilot Chat输入。Monaco编辑器需要特定的agent-browser操作技巧——标准的
click
fill
keyboard type
命令可能无法在部分VS Code版本中生效。

The Universal Pattern: Focus via Keyboard Shortcut +
press

通用方案:通过键盘快捷键聚焦 +
press
命令

This works on all VS Code builds (Code OSS, Insiders, stable):
bash
undefined
此方法适用于所有VS Code版本(Code OSS、Insiders、稳定版):
bash
undefined

1. Open and focus the chat input with the keyboard shortcut

1. 使用键盘快捷键打开并聚焦聊天输入框

macOS:

macOS:

agent-browser press Control+Meta+i
agent-browser press Control+Meta+i

Linux / Windows:

Linux / Windows:

agent-browser press Control+Alt+i
agent-browser press Control+Alt+i

2. Type using individual press commands

2. 使用单个press命令输入文本

agent-browser press H agent-browser press e agent-browser press l agent-browser press l agent-browser press o agent-browser press Space # Use "Space" for spaces agent-browser press w agent-browser press o agent-browser press r agent-browser press l agent-browser press d
agent-browser press H agent-browser press e agent-browser press l agent-browser press l agent-browser press o agent-browser press Space # Use "Space" for spaces agent-browser press w agent-browser press o agent-browser press r agent-browser press l agent-browser press d

Verify text appeared (optional)

验证文本是否输入成功(可选)

agent-browser eval ' (() => { const sidebar = document.querySelector(".part.auxiliarybar"); const viewLines = sidebar.querySelectorAll(".interactive-input-editor .view-line"); return Array.from(viewLines).map(vl => vl.textContent).join("|"); })()'
agent-browser eval ' (() => { const sidebar = document.querySelector(".part.auxiliarybar"); const viewLines = sidebar.querySelectorAll(".interactive-input-editor .view-line"); return Array.from(viewLines).map(vl => vl.textContent).join("|"); })()'

3. Send the message (same on all platforms)

3. 发送消息(所有平台通用)

agent-browser press Enter

**Chat focus shortcut by platform:**
- **macOS:** `Ctrl+Cmd+I` → `agent-browser press Control+Meta+i`
- **Linux:** `Ctrl+Alt+I` → `agent-browser press Control+Alt+i`
- **Windows:** `Ctrl+Alt+I` → `agent-browser press Control+Alt+i`

This shortcut focuses the chat input and sets `document.activeElement` to a `DIV` with class `native-edit-context` — VS Code's native text editing surface that correctly processes key events from `agent-browser press`.
agent-browser press Enter

**各平台聊天聚焦快捷键:**
- **macOS:** `Ctrl+Cmd+I` → `agent-browser press Control+Meta+i`
- **Linux:** `Ctrl+Alt+I` → `agent-browser press Control+Alt+i`
- **Windows:** `Ctrl+Alt+I` → `agent-browser press Control+Alt+i`

此快捷键会聚焦聊天输入框,并将`document.activeElement`设置为带有`native-edit-context`类的`DIV`元素——这是VS Code的原生文本编辑界面,可正确处理来自`agent-browser press`的按键事件。

type @ref
— Works on Some Builds

type @ref
— 仅在部分版本中生效

On VS Code Insiders (extension debug mode),
type @ref
handles focus and input in one step:
bash
agent-browser snapshot -i
在VS Code Insiders(扩展调试模式)中,
type @ref
可一步完成聚焦和输入:
bash
agent-browser snapshot -i

Look for: textbox "The editor is not accessible..." [ref=e62]

Look for: textbox "The editor is not accessible..." [ref=e62]

agent-browser type @e62 "Hello from George!"

However, **`type @ref` silently fails on Code OSS** — the command completes without error but no text appears. This also applies to `keyboard type` and `keyboard inserttext`. Always verify text appeared after typing, and fall back to the keyboard shortcut + `press` pattern if it didn't. The `press`-per-key approach works universally across all builds.
agent-browser type @e62 "Hello from George!"

但**`type @ref`在Code OSS中会静默失败**——命令执行完成但无错误提示,但文本不会显示。`keyboard type`和`keyboard inserttext`命令也存在同样问题。输入后务必验证文本是否显示,若未显示则回退到键盘快捷键 + `press`的方案。逐键`press`的方法在所有版本中都能生效。

Compatibility Matrix

兼容性矩阵

MethodVS Code InsidersCode OSS
press
per key (after focus shortcut)
✅ Works✅ Works
type @ref
✅ Works❌ Silent fail
keyboard type
(after focus)
✅ Works❌ Silent fail
keyboard inserttext
(after focus)
✅ Works❌ Silent fail
click @ref
❌ Blocked by overlay❌ Blocked by overlay
fill @ref
❌ Element not visible❌ Element not visible
方法VS Code InsidersCode OSS
聚焦快捷键 + 逐键press✅ 生效✅ 生效
type @ref
✅ 生效❌ 静默失败
keyboard type
(聚焦后)
✅ 生效❌ 静默失败
keyboard inserttext
(聚焦后)
✅ 生效❌ 静默失败
click @ref
❌ 被遮罩层阻止❌ 被遮罩层阻止
fill @ref
❌ 元素不可见❌ 元素不可见

Fallback: Focus via JavaScript Mouse Events

备选方案:通过JavaScript鼠标事件聚焦

If the keyboard shortcut doesn't work (e.g., chat panel isn't configured), you can focus the editor via JavaScript:
bash
agent-browser eval '
(() => {
  const inputPart = document.querySelector(".interactive-input-part");
  const editor = inputPart.querySelector(".monaco-editor");
  const rect = editor.getBoundingClientRect();
  const x = rect.x + rect.width / 2;
  const y = rect.y + rect.height / 2;
  editor.dispatchEvent(new MouseEvent("mousedown", { bubbles: true, clientX: x, clientY: y }));
  editor.dispatchEvent(new MouseEvent("mouseup", { bubbles: true, clientX: x, clientY: y }));
  editor.dispatchEvent(new MouseEvent("click", { bubbles: true, clientX: x, clientY: y }));
  return "activeElement: " + document.activeElement?.className;
})()'
若键盘快捷键无效(例如聊天面板未配置),你可通过JavaScript聚焦编辑器:
bash
agent-browser eval '
(() => {
  const inputPart = document.querySelector(".interactive-input-part");
  const editor = inputPart.querySelector(".monaco-editor");
  const rect = editor.getBoundingClientRect();
  const x = rect.x + rect.width / 2;
  const y = rect.y + rect.height / 2;
  editor.dispatchEvent(new MouseEvent("mousedown", { bubbles: true, clientX: x, clientY: y }));
  editor.dispatchEvent(new MouseEvent("mouseup", { bubbles: true, clientX: x, clientY: y }));
  editor.dispatchEvent(new MouseEvent("click", { bubbles: true, clientX: x, clientY: y }));
  return "activeElement: " + document.activeElement?.className;
})()'

Then use press for each character

然后使用press命令逐键输入

agent-browser press H agent-browser press e
agent-browser press H agent-browser press e

...

...

undefined
undefined

Verifying Text and Clearing

验证文本与清空输入

bash
undefined
bash
undefined

Verify text in the chat input

验证聊天输入框中的文本

agent-browser eval ' (() => { const sidebar = document.querySelector(".part.auxiliarybar"); const viewLines = sidebar.querySelectorAll(".interactive-input-editor .view-line"); return Array.from(viewLines).map(vl => vl.textContent).join("|"); })()'
agent-browser eval ' (() => { const sidebar = document.querySelector(".part.auxiliarybar"); const viewLines = sidebar.querySelectorAll(".interactive-input-editor .view-line"); return Array.from(viewLines).map(vl => vl.textContent).join("|"); })()'

Clear the input (Select All + Backspace)

清空输入框(全选 + 退格)

macOS:

macOS:

agent-browser press Meta+a
agent-browser press Meta+a

Linux / Windows:

Linux / Windows:

agent-browser press Control+a
agent-browser press Control+a

Then delete:

然后删除:

agent-browser press Backspace
undefined
agent-browser press Backspace
undefined

Screenshot Tips for VS Code

VS Code截图技巧

On ultrawide monitors, the chat sidebar may be in the far-right corner of the CDP screenshot. Options:
  • Use
    agent-browser screenshot --full
    to capture the entire window
  • Use element screenshots:
    agent-browser screenshot ".part.auxiliarybar" sidebar.png
  • Use
    agent-browser screenshot --annotate
    to see labeled element positions
  • Maximize the sidebar first: click the "Maximize Secondary Side Bar" button
macOS: If
agent-browser screenshot
returns "Permission denied", your terminal needs Screen Recording permission. Grant it in System Settings → Privacy & Security → Screen Recording. As a fallback, use the
eval
verification snippet to confirm text was entered — this doesn't require screen permissions.
在超宽显示器上,聊天侧边栏可能位于CDP截图的最右侧。可选方案:
  • 使用
    agent-browser screenshot --full
    捕获整个窗口
  • 使用元素截图:
    agent-browser screenshot ".part.auxiliarybar" sidebar.png
  • 使用
    agent-browser screenshot --annotate
    查看带标签的元素位置
  • 先最大化侧边栏:点击“Maximize Secondary Side Bar”按钮
macOS注意事项:
agent-browser screenshot
返回“Permission denied”,说明你的终端需要屏幕录制权限。在系统设置 → 隐私与安全性 → 屏幕录制中授予权限。作为备选方案,可使用
eval
验证代码片段来确认文本是否输入成功——此方法不需要屏幕权限。

Troubleshooting

故障排除

"Connection refused" or "Cannot connect"

“Connection refused”或“Cannot connect”

  • Make sure Code OSS was launched with
    --remote-debugging-port=NNNN
  • If Code OSS was already running, quit and relaunch with the flag
  • Check that the port isn't in use by another process:
    • macOS / Linux:
      lsof -i :9224
    • Windows:
      netstat -ano | findstr 9224
  • 确保Code OSS是通过
    --remote-debugging-port=NNNN
    参数启动的
  • 若Code OSS已在运行,请退出后重新带参数启动
  • 检查端口是否被其他进程占用:
    • macOS / Linux:
      lsof -i :9224
    • Windows:
      netstat -ano | findstr 9224

Elements not appearing in snapshot

快照中未显示元素

  • VS Code uses multiple webviews. Use
    agent-browser tab
    to list targets and switch to the right one
  • Use
    agent-browser snapshot -i -C
    to include cursor-interactive elements (divs with onclick handlers)
  • VS Code使用多个webview。使用
    agent-browser tab
    列出目标并切换到正确的标签页
  • 使用
    agent-browser snapshot -i -C
    包含可通过光标交互的元素(带有onclick处理器的div)

Cannot type in Monaco Editor inputs

无法在Monaco Editor输入框中输入文本

  • Use
    agent-browser press
    for individual keystrokes after focusing the input. Focus the chat input with the keyboard shortcut (macOS:
    Ctrl+Cmd+I
    , Linux/Windows:
    Ctrl+Alt+I
    ).
  • type @ref
    ,
    keyboard type
    , and
    keyboard inserttext
    work on VS Code Insiders but silently fail on Code OSS — they complete without error but no text appears. The
    press
    -per-key approach works universally.
  • See the "Interacting with Monaco Editor" section above for the full compatibility matrix.
  • 聚焦输入框后,使用
    agent-browser press
    逐键输入。使用键盘快捷键聚焦聊天输入框(macOS:
    Ctrl+Cmd+I
    ,Linux/Windows:
    Ctrl+Alt+I
    )。
  • type @ref
    keyboard type
    keyboard inserttext
    在VS Code Insiders中生效,但在Code OSS中静默失败——命令执行完成但无文本显示。逐键
    press
    的方法在所有版本中都能生效。
  • 请查看上方“与Monaco Editor交互”部分的完整兼容性矩阵。

Cleanup / Disconnect

清理 / 断开连接

⚠️ IMPORTANT: Always quit Code OSS when you're done. Code OSS is a full Electron app that consumes significant memory (often 1–4 GB+). Leaving it running in the background will slow your machine considerably. Don't just disconnect agent-browser — kill the Code OSS process too.
bash
undefined
⚠️ 重要提示:使用完成后请务必退出Code OSS。 Code OSS是完整的Electron应用,会占用大量内存(通常1-4 GB以上)。让它在后台运行会显著降低你的机器性能。不要仅断开agent-browser的连接——务必终止Code OSS进程
bash
undefined

1. Disconnect agent-browser

1. 断开agent-browser连接

agent-browser close
agent-browser close

2. QUIT Code OSS — do not leave it running!

2. 退出Code OSS — 不要让它继续运行!

macOS: Cmd+Q in the app window, or:

macOS: Cmd+Q in the app window, or:

Find the process

Find the process

lsof -i :9224 | grep LISTEN
lsof -i :9224 | grep LISTEN

Kill it (replace <PID> with the actual PID)

Kill it (replace <PID> with the actual PID)

kill <PID>
kill <PID>

Linux:

Linux:

kill $(lsof -t -i :9224)

kill $(lsof -t -i :9224)

Windows:

Windows:

taskkill /F /PID <PID>

taskkill /F /PID <PID>

Or use Task Manager to end "Code - OSS"

Or use Task Manager to end "Code - OSS"


If you launched with `./scripts/code.sh`, the process name is `Electron` or `Code - OSS`. Verify it's gone:
```bash

若你通过`./scripts/code.sh`启动,进程名称为`Electron`或`Code - OSS`。验证进程已终止:
```bash

Confirm no process is listening on the debug port

Confirm no process is listening on the debug port

lsof -i :9224 # should return nothing
undefined
lsof -i :9224 # should return nothing
undefined