Computer Use Playbook

Overview

Use this skill for end-to-end computer automation across browser and desktop surfaces. Browser use is a major track, but not the only one. Prefer deterministic methods first, then escalate to visual/native automation only when required. For browser MCP workflows, treat

tab_id

as a required handle for all stateful actions.

Playbook Structure

Browser use (primary for web tasks): browser MCP tools, DOM snapshots, scripts, screenshots.
Filesystem use: shell-native operations for deterministic file/process work.
Native desktop use: coordinate and window automation only when DOM/shell are insufficient.
Human-in-the-loop checkpoints: login, CAPTCHA, security prompts, or policy-gated steps.

Decision Order

Identify the active surface: browser page, filesystem/process, or native desktop UI.
For browser pages, use browser MCP tools first and keep a strict
```
tab_id
```
contract.
For filesystem/process work, use shell/system tools first (
```
rg
```
,
```
ls
```
,
```
find
```
, etc.).
Escalate to vision or native UI automation only when deterministic methods are insufficient.
If blocked by login, CAPTCHA, or security gates, switch to human-in-the-loop flow.
Verify each critical step with state checks plus screenshot evidence.

Browser Automation (Major Track)

Use browser tools + DOM-first for browser flows. Avoid jumping to native desktop clicks while the target is still reachable by browser tools.

Preferred sequence:

```
open_tab
```
and capture returned
```
tab_id
```
.
```
navigate_to(tab_id, url)
```
for explicit page transitions.

dom_snapshot(tab_id, ...)

run_script(tab_id, ...)

to identify target.

```
run_script(tab_id, ...)
```
action (click/type/submit).

read_page(tab_id, ...)

run_script(tab_id, ...)

to verify URL/title/content.

```
screenshot(tab_id, ...)
```
as evidence.

Session behavior guidance:

always pass

tab_id

for

navigate_to

read_page

screenshot

dom_snapshot

run_script

, and

close_tab

never rely on implicit active-tab behavior.
if a click opens a new tab/window, call
```
list_tabs
```
, detect the new
```
tab_id
```
, and continue explicitly on that
```
tab_id
```
.
keep a local map of
```
purpose -> tab_id
```
when handling multiple tabs.

Escalation triggers:

dynamic overlays not stable via selectors,
canvas/rendered controls,
consent dialogs where selector path is inconsistent,
native picker launched from browser (file upload dialog).

Do not overuse fallback:

if a browser tool can do it, stay in browser tools.
use native automation only for cross-app boundaries (OS dialogs, non-DOM UI).

File Explorer and Filesystem Automation

Prefer shell-native methods before GUI clicking.

Use shell when possible:

search files:
```
rg --files
```
,
```
find
```
move/copy/rename:
```
mv
```
,
```
cp
```
,
```
mkdir
```
inspect metadata:
```
ls -la
```
,
```
stat
```

Use native UI only when the workflow is GUI-only:

OS file picker from browser/app,
drag-drop interactions not scriptable via API,
app-specific explorer panes.

Native UI Automation

Use native UI automation for interactions outside application DOM/API.

Typical tools:

```
xdotool
```
for key/click/type,
```
xprop
```
/
```
xwininfo
```
for window targeting.

Guidelines:

ensure window focus before typing,
prefer keyboard-driven deterministic paths,
keep retries bounded and observable,
re-check application state after each action.

Human-in-the-loop rules

Pause and ask for user intervention when blocked by:

login/2FA challenges,
CAPTCHA or anti-bot checkpoints,
legal/security confirmation screens that require explicit human intent.

When waiting for user action:

explain exactly what the user must do and where.
issue an audible notification using
```
speak
```
so the user notices immediately.
wait, then re-check state (
```
url
```
,
```
title
```
, element visibility, screenshot) before continuing.

Special Cases

Consent dialogs

DOM-first click (
```
Accept all
```
/
```
Reject all
```
/localized variants).
if selector fails but button is visible, use coordinate/native fallback.
confirm modal is not visible and main interaction path works.

CAPTCHA / anti-bot challenges

do not attempt bypass logic.
capture evidence and report blocked state clearly.
require human-in-the-loop completion.
notify user with
```
speak
```
when intervention is required.

Login and account security gates

try normal DOM steps first for username/password field fill and submit.
if SSO, passkey, device approval, or 2FA requires human action, pause and request user action.
after user confirms completion, re-snapshot and continue from verified page state.

File uploads

use DOM file input assignment if available.
if native picker opens, switch to native UI automation.
verify upload appears in page/app state.

Verification Standard

Every important step should end with both:

state evidence (URL/title/content/element state), and
visual evidence (screenshot path).

If blocked, report:

attempted method,
blocker reason,
evidence collected,
next safe fallback.

Learning Library Structure

Use

references/learnings/

as the canonical knowledge base.

```
references/learnings/index.md
```
: topic registry and folder convention.
```
references/learnings/general/
```
: cross-task lessons.
```
references/learnings/<topic-slug>/
```
: topic-specific lessons and experience log.

Topic folder convention:

```
lessons.md
```
for stable workflow rules.
```
experience-log.md
```
for incremental run learnings.

Continuous Learning Loop (Required)

Treat each real run as training data for future runs.

Before starting similar work:

Load
```
references/learnings/index.md
```
.
Map the task to a topic slug (for example
```
google-flow
```
).

Load

references/learnings/general/experience-log.md

Load topic files when present:

references/learnings/<topic-slug>/lessons.md

references/learnings/<topic-slug>/experience-log.md

If the topic folder does not exist, create it with
```
lessons.md
```
and
```
experience-log.md
```
.

During execution:

Capture failure signal and the exact step where it appears.
Record the minimal fix that resolved it.
Keep one-action-at-a-time execution where UI state is fragile.

After completion (or meaningful failure):

Append a short run note to

references/learnings/<topic-slug>/experience-log.md

Include: date, context, failure signal, root cause, fix pattern, reusable rule.
Keep entries concise and deduplicated by updating prior rules instead of adding noisy repeats.

References

Load

references/computer-use-techniques.md

for command snippets and fallback templates. Load

references/learnings/index.md

to select the right topic folder. Load

references/learnings/general/experience-log.md

for cross-task patterns. Load

references/learnings/google-flow/lessons.md

when automating Google Flow video creation. Load

references/learnings/google-flow/experience-log.md

for incremental Google Flow learnings.

computer-use-playbook

NPX Install

Tags

SKILL.md Content