chrome-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill: Chrome Automation (agent-browser)
技能:Chrome自动化(agent-browser)
Automate browser tasks in the user's real Chrome session via the agent-browser CLI.
Prerequisite: agent-browser must be installed and Chrome must have remote debugging enabled. Seeif unsure.references/agent-browser-setup.md
通过agent-browser CLI在用户的真实Chrome会话中自动化浏览器任务。
前提条件:必须安装agent-browser,且Chrome需启用远程调试。若不确定操作方式,请查看。references/agent-browser-setup.md
Core Principle: Reuse the User's Existing Chrome
核心原则:复用用户已有的Chrome
This skill operates on a single Chrome process — the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser.
本技能基于单个Chrome进程运行——即用户正在使用的真实浏览器。无需会话管理、无需独立配置文件、无需启动新的Playwright浏览器。
Always Start by Listing Tabs
始终先列出标签页
Before opening any new page, always list existing tabs first:
bash
agent-browser --auto-connect tab listThis returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open:
- If the target page is already open → switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state.
bash
agent-browser --auto-connect tab <index> - If the target page is NOT open → open it in the current tab or a new tab.
bash
agent-browser --auto-connect open <url>
在打开新页面之前,务必先列出所有已打开的标签页:
bash
agent-browser --auto-connect tab list该命令会返回所有打开的标签页,包含其索引编号、标题和URL。检查目标页面是否已打开:
- 若目标页面已打开 → 直接切换到该标签页,无需打开新页面。用户打开该页面很可能是因为已登录且页面处于合适状态。
bash
agent-browser --auto-connect tab <index> - 若目标页面未打开 → 在当前标签页或新标签页中打开。
bash
agent-browser --auto-connect open <url>
Why This Matters
为何要这样做
- The user's Chrome has their cookies, login sessions, and browser state
- Opening a new page when one is already available wastes time and may lose login state
- Many marketing platforms (social media dashboards, ad managers, CMS tools) require login — reusing an existing logged-in tab avoids re-authentication
- 用户的Chrome中保存了Cookie、登录会话和浏览器状态
- 页面已存在时仍打开新页面会浪费时间,还可能丢失登录状态
- 许多营销平台(社交媒体仪表盘、广告管理器、CMS工具)需要登录——复用已登录的标签页可避免重新认证
Connection
连接方式
Always use to connect to the user's running Chrome instance:
--auto-connectbash
agent-browser --auto-connect <command>This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see ).
references/agent-browser-setup.md请始终使用连接到用户正在运行的Chrome实例:
--auto-connectbash
agent-browser --auto-connect <command>该参数会自动发现已启用远程调试的Chrome。若连接失败,请引导用户按照中的步骤启用远程调试。
references/agent-browser-setup.mdCommon Workflows
常见工作流
1. Navigate and Interact
1. 页面导航与交互
bash
undefinedbash
undefinedList tabs to find existing pages
列出标签页,查找已打开的页面
agent-browser --auto-connect tab list
agent-browser --auto-connect tab list
Switch to an existing tab (if found)
切换到已存在的标签页(若找到)
agent-browser --auto-connect tab <index>
agent-browser --auto-connect tab <index>
Or open a new page
或打开新页面
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect wait --load networkidle
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect wait --load networkidle
Take a snapshot to see interactive elements
生成快照查看可交互元素
agent-browser --auto-connect snapshot -i
agent-browser --auto-connect snapshot -i
Click, fill, etc.
点击、填写等操作
agent-browser --auto-connect click @e3
agent-browser --auto-connect fill @e5 "some text"
undefinedagent-browser --auto-connect click @e3
agent-browser --auto-connect fill @e5 "some text"
undefined2. Extract Data from a Page
2. 从页面提取数据
bash
undefinedbash
undefinedGet all text content
获取所有文本内容
agent-browser --auto-connect get text body
agent-browser --auto-connect get text body
Take a screenshot for visual inspection
截图用于视觉检查
agent-browser --auto-connect screenshot
agent-browser --auto-connect screenshot
Execute JavaScript for structured data
执行JavaScript获取结构化数据
agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"
undefinedagent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"
undefined3. Replay a Chrome DevTools Recording
3. 重放Chrome DevTools录制内容
The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See Replaying Recordings below.
用户可能会提供从Chrome DevTools Recorder导出的录制文件(JSON、Puppeteer JS或@puppeteer/replay JS格式)。请查看下方的【重放录制内容】章节。
Step-by-Step Interaction Guide
分步交互指南
Taking Snapshots
生成快照
Use to see all interactive elements with refs (, , ...):
snapshot -i@e1@e2bash
agent-browser --auto-connect snapshot -iThe output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions.
使用查看所有带引用标识(、……)的可交互元素:
snapshot -i@e1@e2bash
agent-browser --auto-connect snapshot -i输出内容会列出每个可交互元素的角色、文本和引用标识。后续操作可使用这些引用标识。
Step Type Mapping
操作类型映射
| Action | Command |
|---|---|
| Navigate | |
| Click | |
| Fill standard input | |
| Fill rich text editor | |
| Press key | |
| Scroll | |
| Wait for element | |
| Screenshot | |
| Get page text | |
| Get current URL | |
| Run JavaScript | |
| 操作 | 命令 |
|---|---|
| 导航 | |
| 点击 | |
| 填写标准输入框 | |
| 填写富文本编辑器 | |
| 按键 | |
| 滚动 | |
| 等待元素加载 | |
| 截图 | |
| 获取页面文本 | |
| 获取当前URL | |
| 运行JavaScript | |
How to Distinguish Input Types
如何区分输入框类型
- Standard input/textarea → use
fill - Contenteditable div / rich text editor (LinkedIn message box, Gmail compose, Slack, CMS editors) → click/focus first, then use
keyboard inserttext
- 标准input/textarea → 使用命令
fill - 可编辑div/富文本编辑器(LinkedIn消息框、Gmail撰写框、Slack、CMS编辑器)→ 先点击获取焦点,再使用
keyboard inserttext
Ref Lifecycle
引用标识的生命周期
Refs (, , ...) are invalidated when the page changes. Always re-snapshot after:
@e1@e2- Clicking links or buttons that trigger navigation
- Submitting forms
- Triggering dynamic content loads (AJAX, SPA navigation)
引用标识(、……)在页面发生变化时会失效。在以下操作后务必重新生成快照:
@e1@e2- 点击链接或按钮触发页面导航
- 提交表单
- 触发动态内容加载(AJAX、SPA导航)
Verification
验证操作
After each significant action, verify the result:
bash
agent-browser --auto-connect snapshot -i # check interactive state
agent-browser --auto-connect screenshot # visual verification每次执行重要操作后,需验证结果:
bash
agent-browser --auto-connect snapshot -i # 检查可交互状态
agent-browser --auto-connect screenshot # 视觉验证Replaying Recordings
重放录制内容
Accepted Formats
支持的格式
-
JSON (recommended) — structured, can be read progressively:bash
# Count steps jq '.steps | length' recording.json # Read first 5 steps jq '.steps[0:5]' recording.json -
@puppeteer/replay JS ()
import { createRunner } -
Puppeteer JS (,
require('puppeteer'),page.goto)Locator.race
-
JSON(推荐)——结构化格式,可逐步读取:bash
# 统计步骤数量 jq '.steps | length' recording.json # 读取前5步 jq '.steps[0:5]' recording.json -
@puppeteer/replay JS(包含)
import { createRunner } -
Puppeteer JS(包含、
require('puppeteer')、page.goto)Locator.race
How to Replay
重放步骤
- Parse the recording — understand the full intent before acting. Summarize what the recording does.
- List tabs first — check if the target page is already open.
- Navigate — execute steps, reusing existing tabs when possible.
navigate - For each interaction step:
- Take a snapshot () to see current interactive elements
snapshot -i - Match the recording's selectors against the snapshot
aria/... - Fall back to , then CSS class hints, then screenshot
text/... - Do not rely on ember IDs, numeric IDs, or exact XPaths — these change every page load
- Take a snapshot (
- Verify after each step — snapshot or screenshot to confirm
- 解析录制文件——先理解完整操作意图,再执行。总结录制内容的作用。
- 先列出标签页——检查目标页面是否已打开。
- 导航页面——执行导航步骤,尽可能复用已有的标签页。
- 针对每个交互步骤:
- 生成快照()查看当前可交互元素
snapshot -i - 将录制文件中的选择器与快照内容匹配
aria/... - 依次尝试、CSS类提示、截图匹配
text/... - 不要依赖ember ID、数字ID或精确XPath——这些标识在每次页面加载时都会变化
- 生成快照(
- 每步操作后验证——通过快照或截图确认操作结果
Iframe-Heavy Sites
多Iframe网站
snapshot -isnapshot -iDetecting Iframe Issues
检测Iframe问题
- returns unexpectedly short or empty results
snapshot -i - Recording references elements not appearing in snapshot output
- content doesn't match what a screenshot shows
get text body
- 返回的结果异常简短或为空
snapshot -i - 录制文件中引用的元素未出现在快照输出中
- 获取的内容与截图显示的不一致
get text body
Workarounds
解决方法
-
Useto access iframe content:
evalbashagent-browser --auto-connect eval --stdin <<'EVALEOF' const frame = document.querySelector('iframe[data-testid="interop-iframe"]'); const doc = frame.contentDocument; const btn = doc.querySelector('button[aria-label="Send"]'); btn.click(); EVALEOFNote: Only works for same-origin iframes. -
Usefor blind input: If the iframe element has focus,
keyboardsends text regardless of frame boundaries.keyboard inserttext "..." -
Useto read full page content including iframes.
get text body -
Usefor visual verification when snapshot is unreliable.
screenshot
-
使用访问Iframe内容:
evalbashagent-browser --auto-connect eval --stdin <<'EVALEOF' const frame = document.querySelector('iframe[data-testid="interop-iframe"]'); const doc = frame.contentDocument; const btn = doc.querySelector('button[aria-label="Send"]'); btn.click(); EVALEOF注意:仅适用于同源Iframe。 -
使用进行盲输入:如果Iframe元素已获取焦点,
keyboard可无视框架边界输入文本。keyboard inserttext "..." -
**使用**读取包含Iframe在内的完整页面内容。
get text body -
**使用**在快照不可靠时进行视觉验证。
screenshot
When to Ask the User
何时向用户求助
If workarounds fail after 2 attempts on the same step, pause and explain:
- The page uses iframes that cannot be accessed via snapshot
- Which element you need and what you expected
- Ask the user to perform that step manually, then continue
若同一步骤尝试2次解决方法后仍失败,请暂停操作并说明:
- 该页面使用了无法通过快照访问的Iframe
- 需要操作的元素以及预期结果
- 请求用户手动执行该步骤,然后继续后续操作
Handling Unexpected Situations
处理意外情况
Handle Automatically (do not stop):
自动处理(无需停止):
- Popups or banners → dismiss them (or
find text "Dismiss" click)find text "Close" click - Cookie consent dialogs → accept or dismiss
- Tooltip overlays → close them first
- Element not in snapshot → try , or scroll to reveal with
find text "..." clickscroll down 300
- 弹窗或横幅→关闭它们(或
find text "Dismiss" click)find text "Close" click - Cookie授权对话框→接受或关闭
- 提示层→先关闭
- 元素未出现在快照中→尝试,或使用
find text "..." click滚动显示scroll down 300
Pause and Ask the User:
暂停并向用户求助:
- Login / authentication is required
- A CAPTCHA appears
- Page structure is completely different from expected
- A destructive action is about to happen (deleting data, sending real content) — confirm first
- Stuck for more than 2 attempts on the same step
- All iframe workarounds have failed
When pausing, explain clearly: what step you are on, what you expected, and what you see.
- 需要登录/认证
- 出现验证码(CAPTCHA)
- 页面结构与预期完全不同
- 即将执行破坏性操作(删除数据、发送真实内容)——需先确认
- 同一步骤尝试2次仍卡住
- 所有Iframe解决方法均无效
暂停时需清晰说明:当前执行的步骤、预期结果以及实际遇到的问题。
Key Commands Reference
关键命令参考
| Command | Description |
|---|---|
| List all open tabs with index, title, and URL |
| Switch to an existing tab by index |
| Open a new empty tab |
| Close the current tab |
| Navigate to URL |
| List interactive elements with refs |
| Click element by ref |
| Clear and fill standard input/textarea |
| Type without clearing |
| Insert text (best for contenteditable) |
| Press keyboard key |
| Scroll page in pixels |
| Wait for element to appear |
| Wait for network to settle |
| Wait for a duration |
| Take screenshot |
| Screenshot with numbered labels |
| Execute JavaScript in page |
| Get all text content |
| Get current URL |
| Set viewport size |
| Semantic find and click |
| Close browser session |
| 命令 | 描述 |
|---|---|
| 列出所有打开的标签页,包含索引、标题和URL |
| 通过索引切换到已存在的标签页 |
| 打开新的空白标签页 |
| 关闭当前标签页 |
| 导航到指定URL |
| 列出带引用标识的可交互元素 |
| 通过引用标识点击元素 |
| 清空并填写标准输入框/文本域 |
| 输入文本(不清空原有内容) |
| 插入文本(适用于富文本编辑器) |
| 按下指定按键 |
| 按指定像素值上下滚动页面 |
| 等待元素出现 |
| 等待网络状态稳定 |
| 等待指定时长(毫秒) |
| 截图 |
| 带编号标注的截图 |
| 在页面中执行JavaScript |
| 获取页面所有文本内容 |
| 获取当前页面URL |
| 设置视口大小 |
| 语义化查找并点击 |
| 关闭浏览器会话 |
Known Limitations
已知限制
- Iframe blindness: cannot see inside iframes. See Iframe-Heavy Sites.
snapshot -i - strict mode: Fails when multiple elements match. Use
find textto locate the specific ref instead.snapshot -i - vs contenteditable:
fillonly works onfilland<input>. For rich text editors, use<textarea>.keyboard inserttext - is main-frame only: To interact with iframe content, traverse via
evaldocument.querySelector('iframe').contentDocument...
- 无法识别Iframe:无法查看Iframe内部内容。请查看【多Iframe网站】章节。
snapshot -i - 严格模式:当多个元素匹配时会执行失败。请使用
find text定位具体的引用标识。snapshot -i - 与可编辑元素:
fill仅适用于fill和<input>。对于富文本编辑器,请使用<textarea>。keyboard inserttext - 仅在主框架运行:要与Iframe内容交互,需通过
eval进行遍历。document.querySelector('iframe').contentDocument...
Multi-Platform Operations
多平台操作
When the user requests an action across multiple platforms (e.g., "publish this article to Dev.to, LinkedIn, and X"), do NOT attempt all platforms in a single conversation. Instead, launch sequential Agent subagents, one per platform.
当用户要求在多个平台执行操作时(例如:"将这篇文章发布到Dev.to、LinkedIn和X"),请勿在单次对话中尝试所有平台。请启动顺序Agent子代理,每个平台对应一个子代理。
Why Subagents
为何使用子代理
Each platform operation consumes ~25-40K tokens (reference file + snapshots + interactions). Running 3-5 platforms in one context risks hitting the 200K token limit and degrading late-platform accuracy. Each subagent gets its own fresh 200K context window.
每个平台操作大约会消耗25-40K tokens(参考文件+快照+交互内容)。在单个上下文环境中运行3-5个平台操作可能会触及200K token限制,导致后期平台操作的准确性下降。每个子代理都有独立的200K上下文窗口。
How to Execute
执行方式
- Prepare the content — confirm the post text, title, tags, and any platform-specific adaptations with the user.
- For each platform, launch a Agent subagent with a prompt that includes:
general-purpose- The full content to publish
- Instructions to read the relevant reference file (e.g., )
Read /path/to/skills/chrome-automation/references/x.md - Instructions to read the agent-browser skill file for command reference
- The specific task (post, comment, reply, etc.)
- Any platform-specific instructions (e.g., "use these hashtags on LinkedIn")
- Run subagents sequentially (one at a time), because they all share the same Chrome browser via . Parallel subagents would cause tab conflicts.
--auto-connect - After each subagent completes, report the result to the user before launching the next one.
- 准备内容——与用户确认要发布的文本、标题、标签以及各平台的适配内容。
- 针对每个平台,启动一个Agent子代理,提示内容包含:
general-purpose- 完整的待发布内容
- 读取相关参考文件的指令(例如:)
Read /path/to/skills/chrome-automation/references/x.md - 读取agent-browser技能文件作为命令参考的指令
- 具体任务(发布、评论、回复等)
- 平台特定指令(例如:"在LinkedIn使用这些话题标签")
- 顺序运行子代理(一次运行一个),因为它们都通过共享同一个Chrome浏览器。并行运行子代理会导致标签页冲突。
--auto-connect - 每个子代理完成后,向用户报告结果,再启动下一个子代理。
Prompt Template for Subagents
子代理提示模板
You are automating a browser task on [PLATFORM].
First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)
Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:
[TASK DESCRIPTION]
Content to publish:
[CONTENT]
Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explain你正在[PLATFORM]平台上自动化浏览器任务。
首先,阅读以下文件获取上下文:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md(agent-browser命令参考)
然后使用`agent-browser --auto-connect`连接到用户的Chrome浏览器,执行以下任务:
[TASK DESCRIPTION]
待发布内容:
[CONTENT]
重要提示:
- 始终先列出标签页(`tab list`)并复用已登录的标签页
- 每次导航或操作后重新生成快照
- 在提交/发布前(破坏性操作)需与用户确认
- 若需要登录或出现验证码,请停止操作并说明情况When NOT to Use Subagents
何时不使用子代理
- Single platform — just do it directly in the current conversation.
- Read-only tasks (browsing, searching, extracting data) — context usage is lighter; a single conversation can handle 2-3 platforms.
- 单个平台——直接在当前对话中执行操作即可。
- 只读任务(浏览、搜索、提取数据)——上下文消耗较少;单次对话可处理2-3个平台。
Platform References
平台参考
When automating tasks on specific platforms, consult the relevant reference document for page structure details, common operations, and known quirks:
| Platform | Reference | Key Notes |
|---|---|---|
| Custom | |
| X (Twitter) | | |
| Ember.js SPA; Enter submits comments (use Shift+Enter for newlines); comment box and compose box share the same label; avoid | |
| Dev.to | | Fast server-rendered HTML (Forem/Rails); standard |
| Hacker News | | Minimal plain HTML; all form fields are unlabeled; |
For installation and Chrome setup instructions, see.references/agent-browser-setup.md
在特定平台自动化任务时,请查阅相关参考文档了解页面结构细节、常见操作和已知问题:
| 平台 | 参考文档 | 关键说明 |
|---|---|---|
| 自定义 | |
| X (Twitter) | | |
| Ember.js SPA;Enter键提交评论(换行使用Shift+Enter);评论框和撰写框标签相同;避免使用 | |
| Dev.to | | 快速服务端渲染HTML(Forem/Rails);评论/发布使用标准 |
| Hacker News | | 极简纯HTML;所有表单字段无标签; |
安装和Chrome设置说明,请查看。references/agent-browser-setup.md