chrome-automation
Original:🇺🇸 English
Translated
Automate Chrome browser tasks using agent-browser CLI. Navigate pages, fill forms, click buttons, take screenshots, extract data, and replay recorded workflows — all inside the user's real Chrome session.
2installs
Added on
NPX Install
npx skill4agent add zc277584121/marketing-skills chrome-automationTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Skill: Chrome Automation (agent-browser)
Automate browser tasks in the user's real Chrome session via the agent-browser CLI.
Prerequisite: agent-browser must be installed and Chrome must have remote debugging enabled. Seeif unsure.references/agent-browser-setup.md
Core Principle: Reuse the User's Existing Chrome
This skill operates on a single Chrome process — the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser.
Always Start by Listing Tabs
Before opening any new page, always list existing tabs first:
bash
agent-browser --auto-connect tab listThis returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open:
- If the target page is already open → switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state.
bash
agent-browser --auto-connect tab <index> - If the target page is NOT open → open it in the current tab or a new tab.
bash
agent-browser --auto-connect open <url>
Why This Matters
- The user's Chrome has their cookies, login sessions, and browser state
- Opening a new page when one is already available wastes time and may lose login state
- Many marketing platforms (social media dashboards, ad managers, CMS tools) require login — reusing an existing logged-in tab avoids re-authentication
Connection
Always use to connect to the user's running Chrome instance:
--auto-connectbash
agent-browser --auto-connect <command>This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see ).
references/agent-browser-setup.mdCommon Workflows
1. Navigate and Interact
bash
# List tabs to find existing pages
agent-browser --auto-connect tab list
# Switch to an existing tab (if found)
agent-browser --auto-connect tab <index>
# Or open a new page
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect wait --load networkidle
# Take a snapshot to see interactive elements
agent-browser --auto-connect snapshot -i
# Click, fill, etc.
agent-browser --auto-connect click @e3
agent-browser --auto-connect fill @e5 "some text"2. Extract Data from a Page
bash
# Get all text content
agent-browser --auto-connect get text body
# Take a screenshot for visual inspection
agent-browser --auto-connect screenshot
# Execute JavaScript for structured data
agent-browser --auto-connect eval "JSON.stringify(document.querySelectorAll('table tr').length)"3. Replay a Chrome DevTools Recording
The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See Replaying Recordings below.
Step-by-Step Interaction Guide
Taking Snapshots
Use to see all interactive elements with refs (, , ...):
snapshot -i@e1@e2bash
agent-browser --auto-connect snapshot -iThe output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions.
Step Type Mapping
| Action | Command |
|---|---|
| Navigate | |
| Click | |
| Fill standard input | |
| Fill rich text editor | |
| Press key | |
| Scroll | |
| Wait for element | |
| Screenshot | |
| Get page text | |
| Get current URL | |
| Run JavaScript | |
How to Distinguish Input Types
- Standard input/textarea → use
fill - Contenteditable div / rich text editor (LinkedIn message box, Gmail compose, Slack, CMS editors) → click/focus first, then use
keyboard inserttext
Ref Lifecycle
Refs (, , ...) are invalidated when the page changes. Always re-snapshot after:
@e1@e2- Clicking links or buttons that trigger navigation
- Submitting forms
- Triggering dynamic content loads (AJAX, SPA navigation)
Verification
After each significant action, verify the result:
bash
agent-browser --auto-connect snapshot -i # check interactive state
agent-browser --auto-connect screenshot # visual verificationReplaying Recordings
Accepted Formats
-
JSON (recommended) — structured, can be read progressively:bash
# Count steps jq '.steps | length' recording.json # Read first 5 steps jq '.steps[0:5]' recording.json -
@puppeteer/replay JS ()
import { createRunner } -
Puppeteer JS (,
require('puppeteer'),page.goto)Locator.race
How to Replay
- Parse the recording — understand the full intent before acting. Summarize what the recording does.
- List tabs first — check if the target page is already open.
- Navigate — execute steps, reusing existing tabs when possible.
navigate - For each interaction step:
- Take a snapshot () to see current interactive elements
snapshot -i - Match the recording's selectors against the snapshot
aria/... - Fall back to , then CSS class hints, then screenshot
text/... - Do not rely on ember IDs, numeric IDs, or exact XPaths — these change every page load
- Take a snapshot (
- Verify after each step — snapshot or screenshot to confirm
Iframe-Heavy Sites
snapshot -iDetecting Iframe Issues
- returns unexpectedly short or empty results
snapshot -i - Recording references elements not appearing in snapshot output
- content doesn't match what a screenshot shows
get text body
Workarounds
-
Useto access iframe content:
evalbashagent-browser --auto-connect eval --stdin <<'EVALEOF' const frame = document.querySelector('iframe[data-testid="interop-iframe"]'); const doc = frame.contentDocument; const btn = doc.querySelector('button[aria-label="Send"]'); btn.click(); EVALEOFNote: Only works for same-origin iframes. -
Usefor blind input: If the iframe element has focus,
keyboardsends text regardless of frame boundaries.keyboard inserttext "..." -
Useto read full page content including iframes.
get text body -
Usefor visual verification when snapshot is unreliable.
screenshot
When to Ask the User
If workarounds fail after 2 attempts on the same step, pause and explain:
- The page uses iframes that cannot be accessed via snapshot
- Which element you need and what you expected
- Ask the user to perform that step manually, then continue
Handling Unexpected Situations
Handle Automatically (do not stop):
- Popups or banners → dismiss them (or
find text "Dismiss" click)find text "Close" click - Cookie consent dialogs → accept or dismiss
- Tooltip overlays → close them first
- Element not in snapshot → try , or scroll to reveal with
find text "..." clickscroll down 300
Pause and Ask the User:
- Login / authentication is required
- A CAPTCHA appears
- Page structure is completely different from expected
- A destructive action is about to happen (deleting data, sending real content) — confirm first
- Stuck for more than 2 attempts on the same step
- All iframe workarounds have failed
When pausing, explain clearly: what step you are on, what you expected, and what you see.
Key Commands Reference
| Command | Description |
|---|---|
| List all open tabs with index, title, and URL |
| Switch to an existing tab by index |
| Open a new empty tab |
| Close the current tab |
| Navigate to URL |
| List interactive elements with refs |
| Click element by ref |
| Clear and fill standard input/textarea |
| Type without clearing |
| Insert text (best for contenteditable) |
| Press keyboard key |
| Scroll page in pixels |
| Wait for element to appear |
| Wait for network to settle |
| Wait for a duration |
| Take screenshot |
| Screenshot with numbered labels |
| Execute JavaScript in page |
| Get all text content |
| Get current URL |
| Set viewport size |
| Semantic find and click |
| Close browser session |
Known Limitations
- Iframe blindness: cannot see inside iframes. See Iframe-Heavy Sites.
snapshot -i - strict mode: Fails when multiple elements match. Use
find textto locate the specific ref instead.snapshot -i - vs contenteditable:
fillonly works onfilland<input>. For rich text editors, use<textarea>.keyboard inserttext - is main-frame only: To interact with iframe content, traverse via
evaldocument.querySelector('iframe').contentDocument...
Multi-Platform Operations
When the user requests an action across multiple platforms (e.g., "publish this article to Dev.to, LinkedIn, and X"), do NOT attempt all platforms in a single conversation. Instead, launch sequential Agent subagents, one per platform.
Why Subagents
Each platform operation consumes ~25-40K tokens (reference file + snapshots + interactions). Running 3-5 platforms in one context risks hitting the 200K token limit and degrading late-platform accuracy. Each subagent gets its own fresh 200K context window.
How to Execute
- Prepare the content — confirm the post text, title, tags, and any platform-specific adaptations with the user.
- For each platform, launch a Agent subagent with a prompt that includes:
general-purpose- The full content to publish
- Instructions to read the relevant reference file (e.g., )
Read /path/to/skills/chrome-automation/references/x.md - Instructions to read the agent-browser skill file for command reference
- The specific task (post, comment, reply, etc.)
- Any platform-specific instructions (e.g., "use these hashtags on LinkedIn")
- Run subagents sequentially (one at a time), because they all share the same Chrome browser via . Parallel subagents would cause tab conflicts.
--auto-connect - After each subagent completes, report the result to the user before launching the next one.
Prompt Template for Subagents
You are automating a browser task on [PLATFORM].
First, read these files for context:
- /absolute/path/to/skills/chrome-automation/references/[platform].md
- /absolute/path/to/.claude/skills/agent-browser/SKILL.md (agent-browser command reference)
Then connect to the user's Chrome browser using `agent-browser --auto-connect` and perform the following task:
[TASK DESCRIPTION]
Content to publish:
[CONTENT]
Important:
- Always list tabs first (`tab list`) and reuse existing logged-in tabs
- Re-snapshot after every navigation or action
- Confirm with the user before submitting/publishing (destructive action)
- If login is required or a CAPTCHA appears, stop and explainWhen NOT to Use Subagents
- Single platform — just do it directly in the current conversation.
- Read-only tasks (browsing, searching, extracting data) — context usage is lighter; a single conversation can handle 2-3 platforms.
Platform References
When automating tasks on specific platforms, consult the relevant reference document for page structure details, common operations, and known quirks:
| Platform | Reference | Key Notes |
|---|---|---|
| Custom | |
| X (Twitter) | | |
| Ember.js SPA; Enter submits comments (use Shift+Enter for newlines); comment box and compose box share the same label; avoid | |
| Dev.to | | Fast server-rendered HTML (Forem/Rails); standard |
| Hacker News | | Minimal plain HTML; all form fields are unlabeled; |
For installation and Chrome setup instructions, see.references/agent-browser-setup.md