computer-use-playbook
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseComputer Use Playbook
计算机操作自动化手册
Overview
概述
Use this skill for end-to-end computer automation across browser and desktop surfaces. Browser use is a major track, but not the only one. Prefer deterministic methods first, then escalate to visual/native automation only when required. For browser MCP workflows, treat as a required handle for all stateful actions.
tab_id本技能适用于跨浏览器和桌面端的端到端计算机自动化。浏览器操作是主要方向,但并非唯一方向。优先使用确定性方法,仅在必要时才升级到视觉/原生自动化。对于浏览器MCP工作流,需将作为所有有状态操作的必需句柄。
tab_idPlaybook Structure
手册结构
- Browser use (primary for web tasks): browser MCP tools, DOM snapshots, scripts, screenshots.
- Filesystem use: shell-native operations for deterministic file/process work.
- Native desktop use: coordinate and window automation only when DOM/shell are insufficient.
- Human-in-the-loop checkpoints: login, CAPTCHA, security prompts, or policy-gated steps.
- 浏览器操作(Web任务首选):浏览器MCP工具、DOM快照、脚本、截图。
- 文件系统操作:使用Shell原生操作完成确定性的文件/进程任务。
- 原生桌面操作:仅当DOM/Shell方法不足以完成任务时,才使用坐标和窗口自动化。
- 人机协作检查点:登录、CAPTCHA、安全提示或受策略限制的步骤。
Decision Order
决策顺序
- Identify the active surface: browser page, filesystem/process, or native desktop UI.
- For browser pages, use browser MCP tools first and keep a strict contract.
tab_id - For filesystem/process work, use shell/system tools first (,
rg,ls, etc.).find - Escalate to vision or native UI automation only when deterministic methods are insufficient.
- If blocked by login, CAPTCHA, or security gates, switch to human-in-the-loop flow.
- Verify each critical step with state checks plus screenshot evidence.
- 识别当前操作界面:浏览器页面、文件系统/进程或原生桌面UI。
- 对于浏览器页面,优先使用浏览器MCP工具,并严格遵循约定。
tab_id - 对于文件系统/进程任务,优先使用Shell/系统工具(如、
rg、ls等)。find - 仅当确定性方法无效时,才升级到视觉或原生UI自动化。
- 如果被登录、CAPTCHA或安全网关阻挡,切换到人机协作流程。
- 结合状态检查和截图证据验证每个关键步骤。
Browser Automation (Major Track)
浏览器自动化(主要方向)
Use browser tools + DOM-first for browser flows. Avoid jumping to native desktop clicks while the target is still reachable by browser tools.
Preferred sequence:
- and capture returned
open_tab.tab_id - for explicit page transitions.
navigate_to(tab_id, url) - or
dom_snapshot(tab_id, ...)to identify target.run_script(tab_id, ...) - action (click/type/submit).
run_script(tab_id, ...) - /
read_page(tab_id, ...)to verify URL/title/content.run_script(tab_id, ...) - as evidence.
screenshot(tab_id, ...)
Session behavior guidance:
- always pass for
tab_id,navigate_to,read_page,screenshot,dom_snapshot, andrun_script.close_tab - never rely on implicit active-tab behavior.
- if a click opens a new tab/window, call , detect the new
list_tabs, and continue explicitly on thattab_id.tab_id - keep a local map of when handling multiple tabs.
purpose -> tab_id
Escalation triggers:
- dynamic overlays not stable via selectors,
- canvas/rendered controls,
- consent dialogs where selector path is inconsistent,
- native picker launched from browser (file upload dialog).
Do not overuse fallback:
- if a browser tool can do it, stay in browser tools.
- use native automation only for cross-app boundaries (OS dialogs, non-DOM UI).
浏览器流程优先使用浏览器工具+DOM优先策略。当目标仍可通过浏览器工具访问时,避免直接使用原生桌面点击操作。
推荐流程:
- 调用并记录返回的
open_tab。tab_id - 调用完成明确的页面跳转。
navigate_to(tab_id, url) - 使用或
dom_snapshot(tab_id, ...)定位目标元素。run_script(tab_id, ...) - 调用执行操作(点击/输入/提交)。
run_script(tab_id, ...) - 通过/
read_page(tab_id, ...)验证URL/标题/内容。run_script(tab_id, ...) - 调用留存操作证据。
screenshot(tab_id, ...)
会话行为规范:
- 在调用、
navigate_to、read_page、screenshot、dom_snapshot和run_script时,必须传入close_tab。tab_id - 切勿依赖隐式的当前标签页行为。
- 如果点击操作打开了新标签页/窗口,调用检测新的
list_tabs,并明确基于该tab_id继续操作。tab_id - 处理多标签页时,维护本地的「用途 -> tab_id」映射关系。
升级触发条件:
- 动态浮层无法通过选择器稳定定位,
- 画布/渲染控件,
- 选择器路径不一致的授权弹窗,
- 浏览器唤起的原生选择器(如文件上传对话框)。
避免过度使用降级方案:
- 若浏览器工具可完成任务,优先使用浏览器工具。
- 仅在跨应用边界(如系统对话框、非DOM UI)时使用原生自动化。
File Explorer and Filesystem Automation
文件资源管理器与文件系统自动化
Prefer shell-native methods before GUI clicking.
Use shell when possible:
- search files: ,
rg --filesfind - move/copy/rename: ,
mv,cpmkdir - inspect metadata: ,
ls -lastat
Use native UI only when the workflow is GUI-only:
- OS file picker from browser/app,
- drag-drop interactions not scriptable via API,
- app-specific explorer panes.
优先使用Shell原生方法,而非GUI点击操作。
优先使用Shell的场景:
- 文件搜索:、
rg --filesfind - 移动/复制/重命名:、
mv、cpmkdir - 元数据检查:、
ls -lastat
仅当工作流仅支持GUI时使用原生UI:
- 浏览器/应用唤起的系统文件选择器,
- 无法通过API脚本实现的拖放交互,
- 应用专属的资源管理器面板。
Native UI Automation
原生UI自动化
Use native UI automation for interactions outside application DOM/API.
Typical tools:
- for key/click/type,
xdotool - /
xpropfor window targeting.xwininfo
Guidelines:
- ensure window focus before typing,
- prefer keyboard-driven deterministic paths,
- keep retries bounded and observable,
- re-check application state after each action.
原生UI自动化用于与应用DOM/API之外的元素交互。
常用工具:
- :用于按键/点击/输入操作,
xdotool - /
xprop:用于窗口定位。xwininfo
操作规范:
- 输入前确保窗口已获得焦点,
- 优先使用键盘驱动的确定性路径,
- 限制重试次数并可观测重试过程,
- 每次操作后重新检查应用状态。
Human-in-the-loop rules
人机协作规则
Pause and ask for user intervention when blocked by:
- login/2FA challenges,
- CAPTCHA or anti-bot checkpoints,
- legal/security confirmation screens that require explicit human intent.
When waiting for user action:
- explain exactly what the user must do and where.
- issue an audible notification using so the user notices immediately.
speak - wait, then re-check state (,
url, element visibility, screenshot) before continuing.title
当遇到以下阻挡时,暂停并请求用户干预:
- 登录/双因素认证挑战,
- CAPTCHA或反机器人检查点,
- 需要明确人工确认的法律/安全确认界面。
等待用户操作时:
- 明确说明用户需要执行的操作及操作位置。
- 调用发出声音通知,确保用户及时注意到。
speak - 等待后重新检查状态(URL、标题、元素可见性、截图),再继续执行。
Special Cases
特殊场景
Consent dialogs
授权弹窗
- DOM-first click (/
Accept all/localized variants).Reject all - if selector fails but button is visible, use coordinate/native fallback.
- confirm modal is not visible and main interaction path works.
- 优先使用DOM点击(如「全部接受」/「全部拒绝」及本地化变体)。
- 若选择器失效但按钮可见,使用坐标/原生方案降级。
- 确认弹窗已关闭且主交互路径可正常使用。
CAPTCHA / anti-bot challenges
CAPTCHA/反机器人挑战
- do not attempt bypass logic.
- capture evidence and report blocked state clearly.
- require human-in-the-loop completion.
- notify user with when intervention is required.
speak
- 请勿尝试绕过逻辑。
- 留存证据并清晰报告阻挡状态。
- 要求通过人机协作完成。
- 需要干预时调用通知用户。
speak
Login and account security gates
登录与账号安全网关
- try normal DOM steps first for username/password field fill and submit.
- if SSO, passkey, device approval, or 2FA requires human action, pause and request user action.
- after user confirms completion, re-snapshot and continue from verified page state.
- 优先尝试常规DOM步骤:填写用户名/密码字段并提交。
- 若SSO、密钥、设备验证或双因素认证需要人工操作,暂停并请求用户协助。
- 用户确认完成后,重新获取快照并从已验证的页面状态继续执行。
File uploads
文件上传
- use DOM file input assignment if available.
- if native picker opens, switch to native UI automation.
- verify upload appears in page/app state.
- 若支持,优先使用DOM文件输入赋值。
- 若唤起原生选择器,切换到原生UI自动化。
- 验证上传内容已出现在页面/应用状态中。
Verification Standard
验证标准
Every important step should end with both:
- state evidence (URL/title/content/element state), and
- visual evidence (screenshot path).
If blocked, report:
- attempted method,
- blocker reason,
- evidence collected,
- next safe fallback.
每个重要步骤结束后,需同时留存:
- 状态证据(URL/标题/内容/元素状态),以及
- 视觉证据(截图路径)。
若被阻挡,需报告:
- 尝试过的方法,
- 阻挡原因,
- 收集到的证据,
- 下一个安全的降级方案。
Learning Library Structure
学习库结构
Use as the canonical knowledge base.
references/learnings/- : topic registry and folder convention.
references/learnings/index.md - : cross-task lessons.
references/learnings/general/ - : topic-specific lessons and experience log.
references/learnings/<topic-slug>/
Topic folder convention:
- for stable workflow rules.
lessons.md - for incremental run learnings.
experience-log.md
将作为标准知识库。
references/learnings/- :主题注册表和文件夹规范。
references/learnings/index.md - :跨任务经验总结。
references/learnings/general/ - :特定主题的经验总结和执行日志。
references/learnings/<topic-slug>/
主题文件夹规范:
- :稳定的工作流规则。
lessons.md - :增量的执行经验记录。
experience-log.md
Continuous Learning Loop (Required)
持续学习循环(必需)
Treat each real run as training data for future runs.
Before starting similar work:
- Load .
references/learnings/index.md - Map the task to a topic slug (for example ).
google-flow - Load .
references/learnings/general/experience-log.md - Load topic files when present:
references/learnings/<topic-slug>/lessons.mdreferences/learnings/<topic-slug>/experience-log.md
- If the topic folder does not exist, create it with and
lessons.md.experience-log.md
During execution:
- Capture failure signal and the exact step where it appears.
- Record the minimal fix that resolved it.
- Keep one-action-at-a-time execution where UI state is fragile.
After completion (or meaningful failure):
- Append a short run note to .
references/learnings/<topic-slug>/experience-log.md - Include: date, context, failure signal, root cause, fix pattern, reusable rule.
- Keep entries concise and deduplicated by updating prior rules instead of adding noisy repeats.
将每次实际执行作为未来任务的训练数据。
开始类似任务前:
- 加载。
references/learnings/index.md - 将任务映射到对应的主题标识(例如)。
google-flow - 加载。
references/learnings/general/experience-log.md - 若存在对应主题文件,加载以下内容:
references/learnings/<topic-slug>/lessons.mdreferences/learnings/<topic-slug>/experience-log.md
- 若主题文件夹不存在,创建包含和
lessons.md的文件夹。experience-log.md
执行过程中:
- 捕获失败信号及出现失败的具体步骤。
- 记录解决问题的最小修复方案。
- 在UI状态不稳定时,保持每次仅执行一个操作。
执行完成(或出现重大失败)后:
- 在中添加简短的执行记录。
references/learnings/<topic-slug>/experience-log.md - 记录内容包括:日期、上下文、失败信号、根本原因、修复模式、可复用规则。
- 保持记录简洁,通过更新已有规则避免重复记录。
References
参考资料
Load for command snippets and fallback templates.
Load to select the right topic folder.
Load for cross-task patterns.
Load when automating Google Flow video creation.
Load for incremental Google Flow learnings.
references/computer-use-techniques.mdreferences/learnings/index.mdreferences/learnings/general/experience-log.mdreferences/learnings/google-flow/lessons.mdreferences/learnings/google-flow/experience-log.md加载获取命令片段和降级模板。
加载选择正确的主题文件夹。
加载获取跨任务模式。
自动化Google Flow视频创建时,加载。
获取Google Flow增量经验时,加载。
references/computer-use-techniques.mdreferences/learnings/index.mdreferences/learnings/general/experience-log.mdreferences/learnings/google-flow/lessons.mdreferences/learnings/google-flow/experience-log.md