chrome-automation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Skill: Chrome Automation (agent-browser)

技能：Chrome自动化（agent-browser）

Automate browser tasks in the user's real Chrome session via the agent-browser CLI.

Prerequisite: agent-browser must be installed and Chrome must have remote debugging enabled. See
references/agent-browser-setup.md
if unsure.

通过agent-browser CLI在用户的真实Chrome会话中自动化浏览器任务。

前提条件：必须安装agent-browser，且Chrome需启用远程调试。若不确定操作方式，请查看
references/agent-browser-setup.md
。

Core Principle: Reuse the User's Existing Chrome

核心原则：复用用户已有的Chrome

This skill operates on a single Chrome process — the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser.

本技能基于单个Chrome进程运行——即用户正在使用的真实浏览器。无需会话管理、无需独立配置文件、无需启动新的Playwright浏览器。

Always Start by Listing Tabs

始终先列出标签页

Before opening any new page, always list existing tabs first:

bash

agent-browser --auto-connect tab list

This returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open:

If the target page is already open → switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state.
bash
```
agent-browser --auto-connect tab <index>
```
If the target page is NOT open → open it in the current tab or a new tab.
bash
```
agent-browser --auto-connect open <url>
```

在打开新页面之前，务必先列出所有已打开的标签页：

bash

agent-browser --auto-connect tab list

该命令会返回所有打开的标签页，包含其索引编号、标题和URL。检查目标页面是否已打开：

若目标页面已打开 → 直接切换到该标签页，无需打开新页面。用户打开该页面很可能是因为已登录且页面处于合适状态。
bash
```
agent-browser --auto-connect tab <index>
```
若目标页面未打开 → 在当前标签页或新标签页中打开。
bash
```
agent-browser --auto-connect open <url>
```

Why This Matters

为何要这样做

The user's Chrome has their cookies, login sessions, and browser state
Opening a new page when one is already available wastes time and may lose login state
Many marketing platforms (social media dashboards, ad managers, CMS tools) require login — reusing an existing logged-in tab avoids re-authentication

用户的Chrome中保存了Cookie、登录会话和浏览器状态
页面已存在时仍打开新页面会浪费时间，还可能丢失登录状态
许多营销平台（社交媒体仪表盘、广告管理器、CMS工具）需要登录——复用已登录的标签页可避免重新认证

Connection

连接方式

Always use

--auto-connect

to connect to the user's running Chrome instance:

bash

agent-browser --auto-connect <command>

This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see

references/agent-browser-setup.md

请始终使用

--auto-connect

连接到用户正在运行的Chrome实例：

bash

agent-browser --auto-connect <command>

该参数会自动发现已启用远程调试的Chrome。若连接失败，请引导用户按照

references/agent-browser-setup.md

中的步骤启用远程调试。

Common Workflows

常见工作流

1. Navigate and Interact

1. 页面导航与交互

bash

undefined

bash

undefined

List tabs to find existing pages

列出标签页，查找已打开的页面

agent-browser --auto-connect tab list

Switch to an existing tab (if found)

切换到已存在的标签页（若找到）

agent-browser --auto-connect tab <index>

Or open a new page

或打开新页面

agent-browser --auto-connect open https://example.com agent-browser --auto-connect wait --load networkidle

Take a snapshot to see interactive elements

生成快照查看可交互元素

agent-browser --auto-connect snapshot -i

Click, fill, etc.

点击、填写等操作

agent-browser --auto-connect click @e3 agent-browser --auto-connect fill @e5 "some text"

undefined

agent-browser --auto-connect click @e3 agent-browser --auto-connect fill @e5 "some text"

undefined

2. Extract Data from a Page

2. 从页面提取数据

bash

undefined

bash

undefined

Get all text content

获取所有文本内容

agent-browser --auto-connect get text body

Take a screenshot for visual inspection

截图用于视觉检查

agent-browser --auto-connect screenshot

Execute JavaScript for structured data

执行JavaScript获取结构化数据

agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"

undefined

agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"

undefined

3. Replay a Chrome DevTools Recording

3. 重放Chrome DevTools录制内容

The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See Replaying Recordings below.

用户可能会提供从Chrome DevTools Recorder导出的录制文件（JSON、Puppeteer JS或@puppeteer/replay JS格式）。请查看下方的【重放录制内容】章节。

Step-by-Step Interaction Guide

分步交互指南

Taking Snapshots

生成快照

Use

snapshot -i

to see all interactive elements with refs (

@e1

@e2

, ...):

bash

agent-browser --auto-connect snapshot -i

The output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions.

使用

snapshot -i

查看所有带引用标识（

@e1

、

@e2

……）的可交互元素：

bash

agent-browser --auto-connect snapshot -i

输出内容会列出每个可交互元素的角色、文本和引用标识。后续操作可使用这些引用标识。

Step Type Mapping

操作类型映射

Action	Command
Navigate	`agent-browser --auto-connect open <url>` (optionally `wait --load networkidle` , but some sites like Reddit never reach networkidle — skip if `open` already shows the page title)
Click	`snapshot -i` → find ref → `click @eN`
Fill standard input	`click @eN` → `fill @eN "text"`
Fill rich text editor	`click @eN` → `keyboard inserttext "text"`
Press key	`press <key>` (Enter, Tab, Escape, etc.)
Scroll	`scroll down <amount>` or `scroll up <amount>`
Wait for element	`wait @eN` or `wait "<css-selector>"`
Screenshot	`screenshot` or `screenshot --annotate`
Get page text	`get text body`
Get current URL	`get url`
Run JavaScript	`eval <js>`

操作	命令
导航	`agent-browser --auto-connect open <url>` （可搭配 `wait --load networkidle` ，但部分网站如Reddit永远无法达到networkidle状态——若 `open` 命令已显示页面标题，可跳过该参数）
点击	`snapshot -i` → 找到引用标识 → `click @eN`
填写标准输入框	`click @eN` → `fill @eN "text"`
填写富文本编辑器	`click @eN` → `keyboard inserttext "text"`
按键	`press <key>` （Enter、Tab、Escape等）
滚动	`scroll down <amount>` 或 `scroll up <amount>`
等待元素加载	`wait @eN` 或 `wait "<css-selector>"`
截图	`screenshot` 或 `screenshot --annotate`
获取页面文本	`get text body`
获取当前URL	`get url`
运行JavaScript	`eval <js>`

How to Distinguish Input Types

如何区分输入框类型

Standard input/textarea → use
```
fill
```
Contenteditable div / rich text editor (LinkedIn message box, Gmail compose, Slack, CMS editors) → click/focus first, then use
```
keyboard inserttext
```

标准input/textarea → 使用
```
fill
```
命令
可编辑div/富文本编辑器（LinkedIn消息框、Gmail撰写框、Slack、CMS编辑器）→ 先点击获取焦点，再使用
```
keyboard inserttext
```

Ref Lifecycle

引用标识的生命周期

Refs (

@e1

@e2

, ...) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that trigger navigation
Submitting forms
Triggering dynamic content loads (AJAX, SPA navigation)

引用标识（

@e1

、

@e2

……）在页面发生变化时会失效。在以下操作后务必重新生成快照：

点击链接或按钮触发页面导航
提交表单
触发动态内容加载（AJAX、SPA导航）

Verification

验证操作

After each significant action, verify the result:

bash

agent-browser --auto-connect snapshot -i   # check interactive state
agent-browser --auto-connect screenshot     # visual verification

每次执行重要操作后，需验证结果：

bash

agent-browser --auto-connect snapshot -i   # 检查可交互状态
agent-browser --auto-connect screenshot     # 视觉验证

Replaying Recordings

重放录制内容

Accepted Formats

支持的格式

JSON (recommended) — structured, can be read progressively:

bash

# Count steps
jq '.steps | length' recording.json

# Read first 5 steps
jq '.steps[0:5]' recording.json

@puppeteer/replay JS (
```
import { createRunner }
```
)

Puppeteer JS (

require('puppeteer')

page.goto

Locator.race

)

JSON（推荐）——结构化格式，可逐步读取：

bash

# 统计步骤数量
jq '.steps | length' recording.json

# 读取前5步
jq '.steps[0:5]' recording.json

@puppeteer/replay JS（包含
```
import { createRunner }
```
）

Puppeteer JS（包含

require('puppeteer')

、

page.goto

、

Locator.race

）

How to Replay

重放步骤

Parse the recording — understand the full intent before acting. Summarize what the recording does.
List tabs first — check if the target page is already open.
Navigate — execute
```
navigate
```
steps, reusing existing tabs when possible.
For each interaction step:
- Take a snapshot (
```
snapshot -i
```
  ) to see current interactive elements
- Match the recording's
```
aria/...
```
  selectors against the snapshot
- Fall back to
```
text/...
```
  , then CSS class hints, then screenshot
- Do not rely on ember IDs, numeric IDs, or exact XPaths — these change every page load
Verify after each step — snapshot or screenshot to confirm

解析录制文件——先理解完整操作意图，再执行。总结录制内容的作用。
先列出标签页——检查目标页面是否已打开。
导航页面——执行导航步骤，尽可能复用已有的标签页。
针对每个交互步骤：
- 生成快照（
```
snapshot -i
```
  ）查看当前可交互元素
- 将录制文件中的
```
aria/...
```
  选择器与快照内容匹配
- 依次尝试
```
text/...
```
  、CSS类提示、截图匹配
- 不要依赖ember ID、数字ID或精确XPath——这些标识在每次页面加载时都会变化
每步操作后验证——通过快照或截图确认操作结果

Iframe-Heavy Sites

多Iframe网站

snapshot -i

operates on the main frame only and cannot penetrate iframes. Sites like LinkedIn, Gmail, and embedded editors render content inside iframes.

snapshot -i

仅在主框架中运行，无法穿透Iframe。LinkedIn、Gmail和嵌入式编辑器等网站会在Iframe中渲染内容。

Detecting Iframe Issues

检测Iframe问题

```
snapshot -i
```
returns unexpectedly short or empty results
Recording references elements not appearing in snapshot output
```
get text body
```
content doesn't match what a screenshot shows

```
snapshot -i
```
返回的结果异常简短或为空
录制文件中引用的元素未出现在快照输出中
```
get text body
```
获取的内容与截图显示的不一致

Workarounds

解决方法

Use
eval
to access iframe content:

bash

agent-browser --auto-connect eval --stdin <<'EVALEOF'
const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
const doc = frame.contentDocument;
const btn = doc.querySelector('button[aria-label="Send"]');
btn.click();
EVALEOF

Note: Only works for same-origin iframes.

Use
keyboard
for blind input: If the iframe element has focus,
```
keyboard inserttext "..."
```
sends text regardless of frame boundaries.
Use
get text body
to read full page content including iframes.
Use
screenshot
for visual verification when snapshot is unreliable.

使用
eval
访问Iframe内容：

bash

agent-browser --auto-connect eval --stdin <<'EVALEOF'
const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
const doc = frame.contentDocument;
const btn = doc.querySelector('button[aria-label="Send"]');
btn.click();
EVALEOF

注意：仅适用于同源Iframe。

使用
keyboard
进行盲输入：如果Iframe元素已获取焦点，
```
keyboard inserttext "..."
```
可无视框架边界输入文本。
**使用
```
get text body
```
**读取包含Iframe在内的完整页面内容。
**使用
```
screenshot
```
**在快照不可靠时进行视觉验证。

When to Ask the User

何时向用户求助

If workarounds fail after 2 attempts on the same step, pause and explain:

The page uses iframes that cannot be accessed via snapshot
Which element you need and what you expected
Ask the user to perform that step manually, then continue

若同一步骤尝试2次解决方法后仍失败，请暂停操作并说明：

该页面使用了无法通过快照访问的Iframe
需要操作的元素以及预期结果
请求用户手动执行该步骤，然后继续后续操作

Handling Unexpected Situations

处理意外情况

Handle Automatically (do not stop):

自动处理（无需停止）：

Popups or banners → dismiss them (

find text "Dismiss" click

find text "Close" click

)

Cookie consent dialogs → accept or dismiss
Tooltip overlays → close them first
Element not in snapshot → try
```
find text "..." click
```
, or scroll to reveal with
```
scroll down 300
```

弹窗或横幅→关闭它们（

find text "Dismiss" click

或

find text "Close" click

）

Cookie授权对话框→接受或关闭
提示层→先关闭
元素未出现在快照中→尝试
```
find text "..." click
```
，或使用
```
scroll down 300
```
滚动显示

Pause and Ask the User:

暂停并向用户求助：

Login / authentication is required
A CAPTCHA appears
Page structure is completely different from expected
A destructive action is about to happen (deleting data, sending real content) — confirm first
Stuck for more than 2 attempts on the same step
All iframe workarounds have failed

When pausing, explain clearly: what step you are on, what you expected, and what you see.

需要登录/认证
出现验证码（CAPTCHA）
页面结构与预期完全不同
即将执行破坏性操作（删除数据、发送真实内容）——需先确认
同一步骤尝试2次仍卡住
所有Iframe解决方法均无效

暂停时需清晰说明：当前执行的步骤、预期结果以及实际遇到的问题。

Key Commands Reference

关键命令参考

Command	Description
`tab list`	List all open tabs with index, title, and URL
`tab <index>`	Switch to an existing tab by index
`tab new`	Open a new empty tab
`tab close`	Close the current tab
`open <url>`	Navigate to URL
`snapshot -i`	List interactive elements with refs
`click @eN`	Click element by ref
`fill @eN "text"`	Clear and fill standard input/textarea
`type @eN "text"`	Type without clearing
`keyboard inserttext "text"`	Insert text (best for contenteditable)
`press <key>`	Press keyboard key
`scroll down/up <amount>`	Scroll page in pixels
`wait @eN`	Wait for element to appear
`wait --load networkidle`	Wait for network to settle
`wait <ms>`	Wait for a duration
`screenshot [path]`	Take screenshot
`screenshot --annotate`	Screenshot with numbered labels
`eval <js>`	Execute JavaScript in page
`get text body`	Get all text content
`get url`	Get current URL
`set viewport <w> <h>`	Set viewport size
`find text "..." click`	Semantic find and click
`close`	Close browser session

命令	描述
`tab list`	列出所有打开的标签页，包含索引、标题和URL
`tab <index>`	通过索引切换到已存在的标签页
`tab new`	打开新的空白标签页
`tab close`	关闭当前标签页
`open <url>`	导航到指定URL
`snapshot -i`	列出带引用标识的可交互元素
`click @eN`	通过引用标识点击元素
`fill @eN "text"`	清空并填写标准输入框/文本域
`type @eN "text"`	输入文本（不清空原有内容）
`keyboard inserttext "text"`	插入文本（适用于富文本编辑器）
`press <key>`	按下指定按键
`scroll down/up <amount>`	按指定像素值上下滚动页面
`wait @eN`	等待元素出现
`wait --load networkidle`	等待网络状态稳定
`wait <ms>`	等待指定时长（毫秒）
`screenshot [path]`	截图
`screenshot --annotate`	带编号标注的截图
`eval <js>`	在页面中执行JavaScript
`get text body`	获取页面所有文本内容
`get url`	获取当前页面URL
`set viewport <w> <h>`	设置视口大小
`find text "..." click`	语义化查找并点击
`close`	关闭浏览器会话

Known Limitations

已知限制

Iframe blindness:
```
snapshot -i
```
cannot see inside iframes. See Iframe-Heavy Sites.
find text
strict mode: Fails when multiple elements match. Use
```
snapshot -i
```
to locate the specific ref instead.
fill
vs contenteditable:
```
fill
```
only works on
```
<input>
```
and
```
<textarea>
```
. For rich text editors, use
```
keyboard inserttext
```
.
eval
is main-frame only: To interact with iframe content, traverse via
```
document.querySelector('iframe').contentDocument...
```

无法识别Iframe：
```
snapshot -i
```
无法查看Iframe内部内容。请查看【多Iframe网站】章节。
find text
严格模式：当多个元素匹配时会执行失败。请使用
```
snapshot -i
```
定位具体的引用标识。
fill
与可编辑元素：
```
fill
```
仅适用于
```
<input>
```
和
```
<textarea>
```
。对于富文本编辑器，请使用
```
keyboard inserttext
```
。
eval
仅在主框架运行：要与Iframe内容交互，需通过
```
document.querySelector('iframe').contentDocument...
```
进行遍历。

Multi-Platform Operations

多平台操作

When the user requests an action across multiple platforms (e.g., "publish this article to Dev.to, LinkedIn, and X"), do NOT attempt all platforms in a single conversation. Instead, launch sequential Agent subagents, one per platform.

当用户要求在多个平台执行操作时（例如："将这篇文章发布到Dev.to、LinkedIn和X"），请勿在单次对话中尝试所有平台。请启动顺序Agent子代理，每个平台对应一个子代理。

Why Subagents

为何使用子代理

Each platform operation consumes ~25-40K tokens (reference file + snapshots + interactions). Running 3-5 platforms in one context risks hitting the 200K token limit and degrading late-platform accuracy. Each subagent gets its own fresh 200K context window.

每个平台操作大约会消耗25-40K tokens（参考文件+快照+交互内容）。在单个上下文环境中运行3-5个平台操作可能会触及200K token限制，导致后期平台操作的准确性下降。每个子代理都有独立的200K上下文窗口。

How to Execute

执行方式

Prepare the content — confirm the post text, title, tags, and any platform-specific adaptations with the user.
For each platform, launch a
```
general-purpose
```
Agent subagent with a prompt that includes:
- The full content to publish
- Instructions to read the relevant reference file (e.g.,
```
Read /path/to/skills/chrome-automation/references/x.md
```
  )
- Instructions to read the agent-browser skill file for command reference
- The specific task (post, comment, reply, etc.)
- Any platform-specific instructions (e.g., "use these hashtags on LinkedIn")
Run subagents sequentially (one at a time), because they all share the same Chrome browser via
```
--auto-connect
```
. Parallel subagents would cause tab conflicts.
After each subagent completes, report the result to the user before launching the next one.

准备内容——与用户确认要发布的文本、标题、标签以及各平台的适配内容。
针对每个平台，启动一个
```
general-purpose
```
Agent子代理，提示内容包含：
- 完整的待发布内容
- 读取相关参考文件的指令（例如：
```
Read /path/to/skills/chrome-automation/references/x.md
```
  ）
- 读取agent-browser技能文件作为命令参考的指令
- 具体任务（发布、评论、回复等）
- 平台特定指令（例如："在LinkedIn使用这些话题标签"）
顺序运行子代理（一次运行一个），因为它们都通过
```
--auto-connect
```
共享同一个Chrome浏览器。并行运行子代理会导致标签页冲突。
每个子代理完成后，向用户报告结果，再启动下一个子代理。

Prompt Template for Subagents

子代理提示模板

You are automating a browser task on [PLATFORM].

First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)

Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:

[TASK DESCRIPTION]

Content to publish:
[CONTENT]

Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explain

你正在[PLATFORM]平台上自动化浏览器任务。

首先，阅读以下文件获取上下文：
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md（agent-browser命令参考）

然后使用`agent-browser --auto-connect`连接到用户的Chrome浏览器，执行以下任务：

[TASK DESCRIPTION]

待发布内容：
[CONTENT]

重要提示：
- 始终先列出标签页（`tab list`）并复用已登录的标签页
- 每次导航或操作后重新生成快照
- 在提交/发布前（破坏性操作）需与用户确认
- 若需要登录或出现验证码，请停止操作并说明情况

When NOT to Use Subagents

何时不使用子代理

Single platform — just do it directly in the current conversation.
Read-only tasks (browsing, searching, extracting data) — context usage is lighter; a single conversation can handle 2-3 platforms.

单个平台——直接在当前对话中执行操作即可。
只读任务（浏览、搜索、提取数据）——上下文消耗较少；单次对话可处理2-3个平台。

Platform References

平台参考

When automating tasks on specific platforms, consult the relevant reference document for page structure details, common operations, and known quirks:

Platform	Reference	Key Notes
Reddit	`references/reddit.md`	Custom `faceplate-*` components; `networkidle` never reached; unlabeled comment textbox; `find text` fails due to duplicate elements
X (Twitter)	`references/x.md`	`open` often times out (use `tab list` to reuse existing tabs); click timestamp for post detail (not username); DraftJS contenteditable input ( `data-testid="tweetTextarea_0"` ); avoid `networkidle`
LinkedIn	`references/linkedin.md`	Ember.js SPA; Enter submits comments (use Shift+Enter for newlines); comment box and compose box share the same label; avoid `networkidle` ; messaging overlay may block content
Dev.to	`references/devto.md`	Fast server-rendered HTML (Forem/Rails); standard `<textarea>` for comments/posts (Markdown); 5 reaction types; Algolia-powered search; `networkidle` works normally
Hacker News	`references/hackernews.md`	Minimal plain HTML; all form fields are unlabeled; `link "reply"` navigates to separate page; `networkidle` works instantly; rate limiting on posts/comments

For installation and Chrome setup instructions, see
references/agent-browser-setup.md
.

在特定平台自动化任务时，请查阅相关参考文档了解页面结构细节、常见操作和已知问题：

平台	参考文档	关键说明
Reddit	`references/reddit.md`	自定义 `faceplate-*` 组件；永远无法达到networkidle状态；评论输入框无标签； `find text` 因重复元素执行失败
X (Twitter)	`references/x.md`	`open` 命令经常超时（使用 `tab list` 复用已有的标签页）；点击时间戳进入帖子详情页（而非用户名）；DraftJS可编辑输入框（ `data-testid="tweetTextarea_0"` ）；避免使用 `networkidle`
LinkedIn	`references/linkedin.md`	Ember.js SPA；Enter键提交评论（换行使用Shift+Enter）；评论框和撰写框标签相同；避免使用 `networkidle` ；消息覆盖层可能遮挡内容
Dev.to	`references/devto.md`	快速服务端渲染HTML（Forem/Rails）；评论/发布使用标准 `<textarea>` （支持Markdown）；5种反应类型；Algolia驱动的搜索； `networkidle` 可正常使用
Hacker News	`references/hackernews.md`	极简纯HTML；所有表单字段无标签； `link "reply"` 跳转到独立页面； `networkidle` 立即生效；帖子/评论有频率限制

安装和Chrome设置说明，请查看
references/agent-browser-setup.md
。