Computer Use Playbook
Overview
Use this skill for end-to-end computer automation across browser and desktop surfaces. Browser use is a major track, but not the only one. Prefer deterministic methods first, then escalate to visual/native automation only when required. For browser MCP workflows, treat
as a required handle for all stateful actions.
Playbook Structure
- Browser use (primary for web tasks): browser MCP tools, DOM snapshots, scripts, screenshots.
- Filesystem use: shell-native operations for deterministic file/process work.
- Native desktop use: coordinate and window automation only when DOM/shell are insufficient.
- Human-in-the-loop checkpoints: login, CAPTCHA, security prompts, or policy-gated steps.
Decision Order
- Identify the active surface: browser page, filesystem/process, or native desktop UI.
- For browser pages, use browser MCP tools first and keep a strict contract.
- For filesystem/process work, use shell/system tools first (, , , etc.).
- Escalate to vision or native UI automation only when deterministic methods are insufficient.
- If blocked by login, CAPTCHA, or security gates, switch to human-in-the-loop flow.
- Verify each critical step with state checks plus screenshot evidence.
Browser Automation (Major Track)
Use browser tools + DOM-first for browser flows. Avoid jumping to native desktop clicks while the target is still reachable by browser tools.
Preferred sequence:
- and capture returned .
- for explicit page transitions.
dom_snapshot(tab_id, ...)
or to identify target.
- action (click/type/submit).
- / to verify URL/title/content.
- as evidence.
Session behavior guidance:
- always pass for , , , , , and .
- never rely on implicit active-tab behavior.
- if a click opens a new tab/window, call , detect the new , and continue explicitly on that .
- keep a local map of when handling multiple tabs.
Escalation triggers:
- dynamic overlays not stable via selectors,
- canvas/rendered controls,
- consent dialogs where selector path is inconsistent,
- native picker launched from browser (file upload dialog).
Do not overuse fallback:
- if a browser tool can do it, stay in browser tools.
- use native automation only for cross-app boundaries (OS dialogs, non-DOM UI).
File Explorer and Filesystem Automation
Prefer shell-native methods before GUI clicking.
Use shell when possible:
- search files: ,
- move/copy/rename: , ,
- inspect metadata: ,
Use native UI only when the workflow is GUI-only:
- OS file picker from browser/app,
- drag-drop interactions not scriptable via API,
- app-specific explorer panes.
Native UI Automation
Use native UI automation for interactions outside application DOM/API.
Typical tools:
- for key/click/type,
- / for window targeting.
Guidelines:
- ensure window focus before typing,
- prefer keyboard-driven deterministic paths,
- keep retries bounded and observable,
- re-check application state after each action.
Human-in-the-loop rules
Pause and ask for user intervention when blocked by:
- login/2FA challenges,
- CAPTCHA or anti-bot checkpoints,
- legal/security confirmation screens that require explicit human intent.
When waiting for user action:
- explain exactly what the user must do and where.
- issue an audible notification using so the user notices immediately.
- wait, then re-check state (, , element visibility, screenshot) before continuing.
Special Cases
Consent dialogs
- DOM-first click (//localized variants).
- if selector fails but button is visible, use coordinate/native fallback.
- confirm modal is not visible and main interaction path works.
CAPTCHA / anti-bot challenges
- do not attempt bypass logic.
- capture evidence and report blocked state clearly.
- require human-in-the-loop completion.
- notify user with when intervention is required.
Login and account security gates
- try normal DOM steps first for username/password field fill and submit.
- if SSO, passkey, device approval, or 2FA requires human action, pause and request user action.
- after user confirms completion, re-snapshot and continue from verified page state.
File uploads
- use DOM file input assignment if available.
- if native picker opens, switch to native UI automation.
- verify upload appears in page/app state.
Verification Standard
Every important step should end with both:
- state evidence (URL/title/content/element state), and
- visual evidence (screenshot path).
If blocked, report:
- attempted method,
- blocker reason,
- evidence collected,
- next safe fallback.
Learning Library Structure
Use
as the canonical knowledge base.
references/learnings/index.md
: topic registry and folder convention.
references/learnings/general/
: cross-task lessons.
references/learnings/<topic-slug>/
: topic-specific lessons and experience log.
Topic folder convention:
- for stable workflow rules.
- for incremental run learnings.
Continuous Learning Loop (Required)
Treat each real run as training data for future runs.
Before starting similar work:
- Load
references/learnings/index.md
.
- Map the task to a topic slug (for example ).
- Load
references/learnings/general/experience-log.md
.
- Load topic files when present:
references/learnings/<topic-slug>/lessons.md
references/learnings/<topic-slug>/experience-log.md
- If the topic folder does not exist, create it with and .
During execution:
- Capture failure signal and the exact step where it appears.
- Record the minimal fix that resolved it.
- Keep one-action-at-a-time execution where UI state is fragile.
After completion (or meaningful failure):
- Append a short run note to
references/learnings/<topic-slug>/experience-log.md
.
- Include: date, context, failure signal, root cause, fix pattern, reusable rule.
- Keep entries concise and deduplicated by updating prior rules instead of adding noisy repeats.
References
Load
references/computer-use-techniques.md
for command snippets and fallback templates.
Load
references/learnings/index.md
to select the right topic folder.
Load
references/learnings/general/experience-log.md
for cross-task patterns.
Load
references/learnings/google-flow/lessons.md
when automating Google Flow video creation.
Load
references/learnings/google-flow/experience-log.md
for incremental Google Flow learnings.