chrome-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill: Chrome Automation (agent-browser)

技能:Chrome自动化(agent-browser)

Automate browser tasks in the user's real Chrome session via the agent-browser CLI.
Prerequisite: agent-browser must be installed and Chrome must have remote debugging enabled. See
references/agent-browser-setup.md
if unsure.

通过agent-browser CLI在用户的真实Chrome会话中自动化浏览器任务。
前提条件:必须安装agent-browser,且Chrome需启用远程调试。若不确定操作方式,请查看
references/agent-browser-setup.md

Core Principle: Reuse the User's Existing Chrome

核心原则:复用用户已有的Chrome

This skill operates on a single Chrome process — the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser.
本技能基于单个Chrome进程运行——即用户正在使用的真实浏览器。无需会话管理、无需独立配置文件、无需启动新的Playwright浏览器。

Always Start by Listing Tabs

始终先列出标签页

Before opening any new page, always list existing tabs first:
bash
agent-browser --auto-connect tab list
This returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open:
  • If the target page is already open → switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state.
    bash
    agent-browser --auto-connect tab <index>
  • If the target page is NOT open → open it in the current tab or a new tab.
    bash
    agent-browser --auto-connect open <url>
在打开新页面之前,务必先列出所有已打开的标签页
bash
agent-browser --auto-connect tab list
该命令会返回所有打开的标签页,包含其索引编号、标题和URL。检查目标页面是否已打开:
  • 若目标页面已打开 → 直接切换到该标签页,无需打开新页面。用户打开该页面很可能是因为已登录且页面处于合适状态。
    bash
    agent-browser --auto-connect tab <index>
  • 若目标页面未打开 → 在当前标签页或新标签页中打开。
    bash
    agent-browser --auto-connect open <url>

Why This Matters

为何要这样做

  • The user's Chrome has their cookies, login sessions, and browser state
  • Opening a new page when one is already available wastes time and may lose login state
  • Many marketing platforms (social media dashboards, ad managers, CMS tools) require login — reusing an existing logged-in tab avoids re-authentication

  • 用户的Chrome中保存了Cookie、登录会话和浏览器状态
  • 页面已存在时仍打开新页面会浪费时间,还可能丢失登录状态
  • 许多营销平台(社交媒体仪表盘、广告管理器、CMS工具)需要登录——复用已登录的标签页可避免重新认证

Connection

连接方式

Always use
--auto-connect
to connect to the user's running Chrome instance:
bash
agent-browser --auto-connect <command>
This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see
references/agent-browser-setup.md
).

请始终使用
--auto-connect
连接到用户正在运行的Chrome实例:
bash
agent-browser --auto-connect <command>
该参数会自动发现已启用远程调试的Chrome。若连接失败,请引导用户按照
references/agent-browser-setup.md
中的步骤启用远程调试。

Common Workflows

常见工作流

1. Navigate and Interact

1. 页面导航与交互

bash
undefined
bash
undefined

List tabs to find existing pages

列出标签页,查找已打开的页面

agent-browser --auto-connect tab list
agent-browser --auto-connect tab list

Switch to an existing tab (if found)

切换到已存在的标签页(若找到)

agent-browser --auto-connect tab <index>
agent-browser --auto-connect tab <index>

Or open a new page

或打开新页面

agent-browser --auto-connect open https://example.com agent-browser --auto-connect wait --load networkidle
agent-browser --auto-connect open https://example.com agent-browser --auto-connect wait --load networkidle

Take a snapshot to see interactive elements

生成快照查看可交互元素

agent-browser --auto-connect snapshot -i
agent-browser --auto-connect snapshot -i

Click, fill, etc.

点击、填写等操作

agent-browser --auto-connect click @e3 agent-browser --auto-connect fill @e5 "some text"
undefined
agent-browser --auto-connect click @e3 agent-browser --auto-connect fill @e5 "some text"
undefined

2. Extract Data from a Page

2. 从页面提取数据

bash
undefined
bash
undefined

Get all text content

获取所有文本内容

agent-browser --auto-connect get text body
agent-browser --auto-connect get text body

Take a screenshot for visual inspection

截图用于视觉检查

agent-browser --auto-connect screenshot
agent-browser --auto-connect screenshot

Execute JavaScript for structured data

执行JavaScript获取结构化数据

agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"
undefined
agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"
undefined

3. Replay a Chrome DevTools Recording

3. 重放Chrome DevTools录制内容

The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See Replaying Recordings below.

用户可能会提供从Chrome DevTools Recorder导出的录制文件(JSON、Puppeteer JS或@puppeteer/replay JS格式)。请查看下方的【重放录制内容】章节。

Step-by-Step Interaction Guide

分步交互指南

Taking Snapshots

生成快照

Use
snapshot -i
to see all interactive elements with refs (
@e1
,
@e2
, ...):
bash
agent-browser --auto-connect snapshot -i
The output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions.
使用
snapshot -i
查看所有带引用标识(
@e1
@e2
……)的可交互元素:
bash
agent-browser --auto-connect snapshot -i
输出内容会列出每个可交互元素的角色、文本和引用标识。后续操作可使用这些引用标识。

Step Type Mapping

操作类型映射

ActionCommand
Navigate
agent-browser --auto-connect open <url>
(optionally
wait --load networkidle
, but some sites like Reddit never reach networkidle — skip if
open
already shows the page title)
Click
snapshot -i
→ find ref →
click @eN
Fill standard input
click @eN
fill @eN "text"
Fill rich text editor
click @eN
keyboard inserttext "text"
Press key
press <key>
(Enter, Tab, Escape, etc.)
Scroll
scroll down <amount>
or
scroll up <amount>
Wait for element
wait @eN
or
wait "<css-selector>"
Screenshot
screenshot
or
screenshot --annotate
Get page text
get text body
Get current URL
get url
Run JavaScript
eval <js>
操作命令
导航
agent-browser --auto-connect open <url>
(可搭配
wait --load networkidle
,但部分网站如Reddit永远无法达到networkidle状态——若
open
命令已显示页面标题,可跳过该参数)
点击
snapshot -i
→ 找到引用标识 →
click @eN
填写标准输入框
click @eN
fill @eN "text"
填写富文本编辑器
click @eN
keyboard inserttext "text"
按键
press <key>
(Enter、Tab、Escape等)
滚动
scroll down <amount>
scroll up <amount>
等待元素加载
wait @eN
wait "<css-selector>"
截图
screenshot
screenshot --annotate
获取页面文本
get text body
获取当前URL
get url
运行JavaScript
eval <js>

How to Distinguish Input Types

如何区分输入框类型

  • Standard input/textarea → use
    fill
  • Contenteditable div / rich text editor (LinkedIn message box, Gmail compose, Slack, CMS editors) → click/focus first, then use
    keyboard inserttext
  • 标准input/textarea → 使用
    fill
    命令
  • 可编辑div/富文本编辑器(LinkedIn消息框、Gmail撰写框、Slack、CMS编辑器)→ 先点击获取焦点,再使用
    keyboard inserttext

Ref Lifecycle

引用标识的生命周期

Refs (
@e1
,
@e2
, ...) are invalidated when the page changes. Always re-snapshot after:
  • Clicking links or buttons that trigger navigation
  • Submitting forms
  • Triggering dynamic content loads (AJAX, SPA navigation)
引用标识(
@e1
@e2
……)在页面发生变化时会失效。在以下操作后务必重新生成快照:
  • 点击链接或按钮触发页面导航
  • 提交表单
  • 触发动态内容加载(AJAX、SPA导航)

Verification

验证操作

After each significant action, verify the result:
bash
agent-browser --auto-connect snapshot -i   # check interactive state
agent-browser --auto-connect screenshot     # visual verification

每次执行重要操作后,需验证结果:
bash
agent-browser --auto-connect snapshot -i   # 检查可交互状态
agent-browser --auto-connect screenshot     # 视觉验证

Replaying Recordings

重放录制内容

Accepted Formats

支持的格式

  1. JSON (recommended) — structured, can be read progressively:
    bash
    # Count steps
    jq '.steps | length' recording.json
    
    # Read first 5 steps
    jq '.steps[0:5]' recording.json
  2. @puppeteer/replay JS (
    import { createRunner }
    )
  3. Puppeteer JS (
    require('puppeteer')
    ,
    page.goto
    ,
    Locator.race
    )
  1. JSON(推荐)——结构化格式,可逐步读取:
    bash
    # 统计步骤数量
    jq '.steps | length' recording.json
    
    # 读取前5步
    jq '.steps[0:5]' recording.json
  2. @puppeteer/replay JS(包含
    import { createRunner }
  3. Puppeteer JS(包含
    require('puppeteer')
    page.goto
    Locator.race

How to Replay

重放步骤

  1. Parse the recording — understand the full intent before acting. Summarize what the recording does.
  2. List tabs first — check if the target page is already open.
  3. Navigate — execute
    navigate
    steps, reusing existing tabs when possible.
  4. For each interaction step:
    • Take a snapshot (
      snapshot -i
      ) to see current interactive elements
    • Match the recording's
      aria/...
      selectors against the snapshot
    • Fall back to
      text/...
      , then CSS class hints, then screenshot
    • Do not rely on ember IDs, numeric IDs, or exact XPaths — these change every page load
  5. Verify after each step — snapshot or screenshot to confirm

  1. 解析录制文件——先理解完整操作意图,再执行。总结录制内容的作用。
  2. 先列出标签页——检查目标页面是否已打开。
  3. 导航页面——执行导航步骤,尽可能复用已有的标签页。
  4. 针对每个交互步骤
    • 生成快照(
      snapshot -i
      )查看当前可交互元素
    • 将录制文件中的
      aria/...
      选择器与快照内容匹配
    • 依次尝试
      text/...
      、CSS类提示、截图匹配
    • 不要依赖ember ID、数字ID或精确XPath——这些标识在每次页面加载时都会变化
  5. 每步操作后验证——通过快照或截图确认操作结果

Iframe-Heavy Sites

多Iframe网站

snapshot -i
operates on the main frame only and cannot penetrate iframes. Sites like LinkedIn, Gmail, and embedded editors render content inside iframes.
snapshot -i
仅在主框架中运行,无法穿透Iframe。LinkedIn、Gmail和嵌入式编辑器等网站会在Iframe中渲染内容。

Detecting Iframe Issues

检测Iframe问题

  • snapshot -i
    returns unexpectedly short or empty results
  • Recording references elements not appearing in snapshot output
  • get text body
    content doesn't match what a screenshot shows
  • snapshot -i
    返回的结果异常简短或为空
  • 录制文件中引用的元素未出现在快照输出中
  • get text body
    获取的内容与截图显示的不一致

Workarounds

解决方法

  1. Use
    eval
    to access iframe content
    :
    bash
    agent-browser --auto-connect eval --stdin <<'EVALEOF'
    const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
    const doc = frame.contentDocument;
    const btn = doc.querySelector('button[aria-label="Send"]');
    btn.click();
    EVALEOF
    Note: Only works for same-origin iframes.
  2. Use
    keyboard
    for blind input
    : If the iframe element has focus,
    keyboard inserttext "..."
    sends text regardless of frame boundaries.
  3. Use
    get text body
    to read full page content including iframes.
  4. Use
    screenshot
    for visual verification when snapshot is unreliable.
  1. 使用
    eval
    访问Iframe内容
    bash
    agent-browser --auto-connect eval --stdin <<'EVALEOF'
    const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
    const doc = frame.contentDocument;
    const btn = doc.querySelector('button[aria-label="Send"]');
    btn.click();
    EVALEOF
    注意:仅适用于同源Iframe。
  2. 使用
    keyboard
    进行盲输入
    :如果Iframe元素已获取焦点,
    keyboard inserttext "..."
    可无视框架边界输入文本。
  3. **使用
    get text body
    **读取包含Iframe在内的完整页面内容。
  4. **使用
    screenshot
    **在快照不可靠时进行视觉验证。

When to Ask the User

何时向用户求助

If workarounds fail after 2 attempts on the same step, pause and explain:
  • The page uses iframes that cannot be accessed via snapshot
  • Which element you need and what you expected
  • Ask the user to perform that step manually, then continue

若同一步骤尝试2次解决方法后仍失败,请暂停操作并说明:
  • 该页面使用了无法通过快照访问的Iframe
  • 需要操作的元素以及预期结果
  • 请求用户手动执行该步骤,然后继续后续操作

Handling Unexpected Situations

处理意外情况

Handle Automatically (do not stop):

自动处理(无需停止):

  • Popups or banners → dismiss them (
    find text "Dismiss" click
    or
    find text "Close" click
    )
  • Cookie consent dialogs → accept or dismiss
  • Tooltip overlays → close them first
  • Element not in snapshot → try
    find text "..." click
    , or scroll to reveal with
    scroll down 300
  • 弹窗或横幅→关闭它们(
    find text "Dismiss" click
    find text "Close" click
  • Cookie授权对话框→接受或关闭
  • 提示层→先关闭
  • 元素未出现在快照中→尝试
    find text "..." click
    ,或使用
    scroll down 300
    滚动显示

Pause and Ask the User:

暂停并向用户求助:

  • Login / authentication is required
  • A CAPTCHA appears
  • Page structure is completely different from expected
  • A destructive action is about to happen (deleting data, sending real content) — confirm first
  • Stuck for more than 2 attempts on the same step
  • All iframe workarounds have failed
When pausing, explain clearly: what step you are on, what you expected, and what you see.

  • 需要登录/认证
  • 出现验证码(CAPTCHA)
  • 页面结构与预期完全不同
  • 即将执行破坏性操作(删除数据、发送真实内容)——需先确认
  • 同一步骤尝试2次仍卡住
  • 所有Iframe解决方法均无效
暂停时需清晰说明:当前执行的步骤、预期结果以及实际遇到的问题。

Key Commands Reference

关键命令参考

CommandDescription
tab list
List all open tabs with index, title, and URL
tab <index>
Switch to an existing tab by index
tab new
Open a new empty tab
tab close
Close the current tab
open <url>
Navigate to URL
snapshot -i
List interactive elements with refs
click @eN
Click element by ref
fill @eN "text"
Clear and fill standard input/textarea
type @eN "text"
Type without clearing
keyboard inserttext "text"
Insert text (best for contenteditable)
press <key>
Press keyboard key
scroll down/up <amount>
Scroll page in pixels
wait @eN
Wait for element to appear
wait --load networkidle
Wait for network to settle
wait <ms>
Wait for a duration
screenshot [path]
Take screenshot
screenshot --annotate
Screenshot with numbered labels
eval <js>
Execute JavaScript in page
get text body
Get all text content
get url
Get current URL
set viewport <w> <h>
Set viewport size
find text "..." click
Semantic find and click
close
Close browser session

命令描述
tab list
列出所有打开的标签页,包含索引、标题和URL
tab <index>
通过索引切换到已存在的标签页
tab new
打开新的空白标签页
tab close
关闭当前标签页
open <url>
导航到指定URL
snapshot -i
列出带引用标识的可交互元素
click @eN
通过引用标识点击元素
fill @eN "text"
清空并填写标准输入框/文本域
type @eN "text"
输入文本(不清空原有内容)
keyboard inserttext "text"
插入文本(适用于富文本编辑器)
press <key>
按下指定按键
scroll down/up <amount>
按指定像素值上下滚动页面
wait @eN
等待元素出现
wait --load networkidle
等待网络状态稳定
wait <ms>
等待指定时长(毫秒)
screenshot [path]
截图
screenshot --annotate
带编号标注的截图
eval <js>
在页面中执行JavaScript
get text body
获取页面所有文本内容
get url
获取当前页面URL
set viewport <w> <h>
设置视口大小
find text "..." click
语义化查找并点击
close
关闭浏览器会话

Known Limitations

已知限制

  1. Iframe blindness:
    snapshot -i
    cannot see inside iframes. See Iframe-Heavy Sites.
  2. find text
    strict mode
    : Fails when multiple elements match. Use
    snapshot -i
    to locate the specific ref instead.
  3. fill
    vs contenteditable
    :
    fill
    only works on
    <input>
    and
    <textarea>
    . For rich text editors, use
    keyboard inserttext
    .
  4. eval
    is main-frame only
    : To interact with iframe content, traverse via
    document.querySelector('iframe').contentDocument...

  1. 无法识别Iframe
    snapshot -i
    无法查看Iframe内部内容。请查看【多Iframe网站】章节。
  2. find text
    严格模式
    :当多个元素匹配时会执行失败。请使用
    snapshot -i
    定位具体的引用标识。
  3. fill
    与可编辑元素
    fill
    仅适用于
    <input>
    <textarea>
    。对于富文本编辑器,请使用
    keyboard inserttext
  4. eval
    仅在主框架运行
    :要与Iframe内容交互,需通过
    document.querySelector('iframe').contentDocument...
    进行遍历。

Multi-Platform Operations

多平台操作

When the user requests an action across multiple platforms (e.g., "publish this article to Dev.to, LinkedIn, and X"), do NOT attempt all platforms in a single conversation. Instead, launch sequential Agent subagents, one per platform.
当用户要求在多个平台执行操作时(例如:"将这篇文章发布到Dev.to、LinkedIn和X"),请勿在单次对话中尝试所有平台。请启动顺序Agent子代理,每个平台对应一个子代理。

Why Subagents

为何使用子代理

Each platform operation consumes ~25-40K tokens (reference file + snapshots + interactions). Running 3-5 platforms in one context risks hitting the 200K token limit and degrading late-platform accuracy. Each subagent gets its own fresh 200K context window.
每个平台操作大约会消耗25-40K tokens(参考文件+快照+交互内容)。在单个上下文环境中运行3-5个平台操作可能会触及200K token限制,导致后期平台操作的准确性下降。每个子代理都有独立的200K上下文窗口。

How to Execute

执行方式

  1. Prepare the content — confirm the post text, title, tags, and any platform-specific adaptations with the user.
  2. For each platform, launch a
    general-purpose
    Agent subagent with a prompt that includes:
    • The full content to publish
    • Instructions to read the relevant reference file (e.g.,
      Read /path/to/skills/chrome-automation/references/x.md
      )
    • Instructions to read the agent-browser skill file for command reference
    • The specific task (post, comment, reply, etc.)
    • Any platform-specific instructions (e.g., "use these hashtags on LinkedIn")
  3. Run subagents sequentially (one at a time), because they all share the same Chrome browser via
    --auto-connect
    . Parallel subagents would cause tab conflicts.
  4. After each subagent completes, report the result to the user before launching the next one.
  1. 准备内容——与用户确认要发布的文本、标题、标签以及各平台的适配内容。
  2. 针对每个平台,启动一个
    general-purpose
    Agent子代理,提示内容包含:
    • 完整的待发布内容
    • 读取相关参考文件的指令(例如:
      Read /path/to/skills/chrome-automation/references/x.md
    • 读取agent-browser技能文件作为命令参考的指令
    • 具体任务(发布、评论、回复等)
    • 平台特定指令(例如:"在LinkedIn使用这些话题标签")
  3. 顺序运行子代理(一次运行一个),因为它们都通过
    --auto-connect
    共享同一个Chrome浏览器。并行运行子代理会导致标签页冲突。
  4. 每个子代理完成后,向用户报告结果,再启动下一个子代理。

Prompt Template for Subagents

子代理提示模板

You are automating a browser task on [PLATFORM].

First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)

Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:

[TASK DESCRIPTION]

Content to publish:
[CONTENT]

Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explain
你正在[PLATFORM]平台上自动化浏览器任务。

首先,阅读以下文件获取上下文:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md(agent-browser命令参考)

然后使用`agent-browser --auto-connect`连接到用户的Chrome浏览器,执行以下任务:

[TASK DESCRIPTION]

待发布内容:
[CONTENT]

重要提示:
- 始终先列出标签页(`tab list`)并复用已登录的标签页
- 每次导航或操作后重新生成快照
- 在提交/发布前(破坏性操作)需与用户确认
- 若需要登录或出现验证码,请停止操作并说明情况

When NOT to Use Subagents

何时不使用子代理

  • Single platform — just do it directly in the current conversation.
  • Read-only tasks (browsing, searching, extracting data) — context usage is lighter; a single conversation can handle 2-3 platforms.

  • 单个平台——直接在当前对话中执行操作即可。
  • 只读任务(浏览、搜索、提取数据)——上下文消耗较少;单次对话可处理2-3个平台。

Platform References

平台参考

When automating tasks on specific platforms, consult the relevant reference document for page structure details, common operations, and known quirks:
PlatformReferenceKey Notes
Reddit
references/reddit.md
Custom
faceplate-*
components;
networkidle
never reached; unlabeled comment textbox;
find text
fails due to duplicate elements
X (Twitter)
references/x.md
open
often times out (use
tab list
to reuse existing tabs); click timestamp for post detail (not username); DraftJS contenteditable input (
data-testid="tweetTextarea_0"
); avoid
networkidle
LinkedIn
references/linkedin.md
Ember.js SPA; Enter submits comments (use Shift+Enter for newlines); comment box and compose box share the same label; avoid
networkidle
; messaging overlay may block content
Dev.to
references/devto.md
Fast server-rendered HTML (Forem/Rails); standard
<textarea>
for comments/posts (Markdown); 5 reaction types; Algolia-powered search;
networkidle
works normally
Hacker News
references/hackernews.md
Minimal plain HTML; all form fields are unlabeled;
link "reply"
navigates to separate page;
networkidle
works instantly; rate limiting on posts/comments

For installation and Chrome setup instructions, see
references/agent-browser-setup.md
.
在特定平台自动化任务时,请查阅相关参考文档了解页面结构细节、常见操作和已知问题:
平台参考文档关键说明
Reddit
references/reddit.md
自定义
faceplate-*
组件;永远无法达到networkidle状态;评论输入框无标签;
find text
因重复元素执行失败
X (Twitter)
references/x.md
open
命令经常超时(使用
tab list
复用已有的标签页);点击时间戳进入帖子详情页(而非用户名);DraftJS可编辑输入框(
data-testid="tweetTextarea_0"
);避免使用
networkidle
LinkedIn
references/linkedin.md
Ember.js SPA;Enter键提交评论(换行使用Shift+Enter);评论框和撰写框标签相同;避免使用
networkidle
;消息覆盖层可能遮挡内容
Dev.to
references/devto.md
快速服务端渲染HTML(Forem/Rails);评论/发布使用标准
<textarea>
(支持Markdown);5种反应类型;Algolia驱动的搜索;
networkidle
可正常使用
Hacker News
references/hackernews.md
极简纯HTML;所有表单字段无标签;
link "reply"
跳转到独立页面;
networkidle
立即生效;帖子/评论有频率限制

安装和Chrome设置说明,请查看
references/agent-browser-setup.md