pi-computer-use
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesepi-computer-use
pi-computer-use
Skill by ara.so — Daily 2026 Skills collection.
pi-computer-use@e1由ara.so开发的Skill——Daily 2026 Skills合集。
pi-computer-use@e1Installation
安装
Via Pi (recommended)
通过Pi(推荐)
bash
pi install git:github.com/injaneity/pi-computer-use#v0.2.1Pin to a specific version:
bash
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1bash
pi install git:github.com/injaneity/pi-computer-use#v0.2.1固定到特定版本:
bash
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1Via npm
通过npm
bash
npm install @injaneity/pi-computer-usebash
npm install @injaneity/pi-computer-useor pin a version
或固定版本
npm install @injaneity/pi-computer-use@0.2.1
undefinednpm install @injaneity/pi-computer-use@0.2.1
undefinedRemove
卸载
bash
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1
npm remove @injaneity/pi-computer-usebash
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1
npm remove @injaneity/pi-computer-useFirst-Run Permissions
首次运行权限
On first session, macOS will prompt for permissions for:
~/.pi/agent/helpers/pi-computer-use/bridgeGrant both:
- Accessibility — required for AX ref targeting
- Screen Recording — required for screenshots
首次启动会话时,macOS会提示为以下程序授予权限:
~/.pi/agent/helpers/pi-computer-use/bridge需同时授予:
- 辅助功能 —— 用于AX引用定位
- 屏幕录制 —— 用于截图
How It Works
工作原理
Three components:
- Pi extension () — registers public tools and
extensions/computer-use.tscommand/computer-use - TypeScript bridge () — manages window state, AX refs, fallback policy, batching, execution metadata
src/bridge.ts - Native Swift helper () — talks to macOS Accessibility, ScreenCaptureKit, AppKit, CoreGraphics
native/macos/bridge.swift
包含三个组件:
- Pi扩展()—— 注册公共工具和
extensions/computer-use.ts命令/computer-use - TypeScript桥接层()—— 管理窗口状态、AX引用、回退策略、批量处理、执行元数据
src/bridge.ts - 原生Swift辅助工具()—— 对接macOS Accessibility、ScreenCaptureKit、AppKit、CoreGraphics
native/macos/bridge.swift
Available Tools
可用工具
| Tool | Purpose |
|---|---|
| List running apps |
| List windows for an app |
| Capture window + return AX state |
| Click element or coordinate |
| Double-click element or coordinate |
| Move cursor |
| Drag from point to point |
| Scroll element or coordinate |
| Press key combination |
| Type raw text |
| Replace element value via AX |
| Pause execution |
| Position/resize window |
| Batch multiple actions |
| 工具 | 用途 |
|---|---|
| 列出运行中的应用 |
| 列出指定应用的窗口 |
| 捕获窗口并返回AX状态 |
| 点击元素或坐标 |
| 双击元素或坐标 |
| 移动光标 |
| 从某点拖拽到另一点 |
| 滚动元素或指定坐标区域 |
| 按下组合键 |
| 输入纯文本 |
| 通过AX替换元素内容 |
| 暂停执行 |
| 调整窗口位置/大小 |
| 批量执行多个操作 |
Core Workflow
核心工作流程
Always start a session with to select the controlled window and obtain AX refs:
screenshotts
// 1. Discover apps and windows if target is ambiguous
list_apps()
list_windows({ app: "Safari" })
// 2. Select the window and get AX state
screenshot({ window: "@w1" })
// 3. Act on AX refs returned from screenshot
click({ window: "@w1", ref: "@e1" })
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })会话始终以开始,用于选择受控窗口并获取AX引用:
screenshotts
// 1. 若目标不明确,先发现应用和窗口
list_apps()
list_windows({ app: "Safari" })
// 2. 选择窗口并获取AX状态
screenshot({ window: "@w1" })
// 3. 基于screenshot返回的AX引用执行操作
click({ window: "@w1", ref: "@e1" })
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })AX Ref Targeting (Preferred)
AX引用定位(优先推荐)
AX refs like , are returned by and carry capability metadata:
@e1@e2screenshot- — supports
canSetValueset_text - — supports
canPressclick - — can receive focus
canFocus - — supports
canScrollscroll - — supports value adjustment
adjust
ts
// Click by AX ref — no coordinates needed
click({ ref: "@e1" })
// Scroll a specific element
scroll({ ref: "@e3", scrollY: 600 })
// Replace text field value atomically
set_text({ ref: "@e2", text: "hello world" })screenshot@e1@e2- —— 支持
canSetValue操作set_text - —— 支持
canPress操作click - —— 可获取焦点
canFocus - —— 支持
canScroll操作scroll - —— 支持值调整
adjust
ts
// 通过AX引用点击——无需坐标
click({ ref: "@e1" })
// 滚动指定元素
scroll({ ref: "@e3", scrollY: 600 })
// 原子性替换文本框内容
set_text({ ref: "@e2", text: "hello world" })Coordinate Fallback
坐标回退方案
Use coordinates only when no suitable AX target exists. Always include from the latest screenshot to guard against stale state:
stateIdts
click({ x: 320, y: 180, stateId: "abc123" })仅当无合适AX目标时才使用坐标。务必包含最新截图返回的,以避免状态过期:
stateIdts
click({ x: 320, y: 180, stateId: "abc123" })Batching Actions
批量操作
Use to batch obvious sequential steps. One semantic state update is returned after all actions:
computer_actionsts
computer_actions({
stateId: "abc123",
actions: [
{ type: "click", ref: "@e1" },
{ type: "set_text", ref: "@e2", text: "https://example.com" },
{ type: "keypress", keys: ["Enter"] }
]
})Each action in the result includes execution metadata:
- — background-safe AX path (no focus takeover)
stealth - — required focus or raw event fallback
default
使用批量执行明显的连续步骤。所有操作完成后返回一次语义化状态更新:
computer_actionsts
computer_actions({
stateId: "abc123",
actions: [
{ type: "click", ref: "@e1" },
{ type: "set_text", ref: "@e2", text: "https://example.com" },
{ type: "keypress", keys: ["Enter"] }
]
})结果中的每个操作都包含执行元数据:
- —— 后台安全的AX路径(不会抢占焦点)
stealth - —— 需要焦点或原始事件回退
default
Window Management
窗口管理
ts
// List windows for a specific app
list_windows({ app: "Finder" })
// Target a specific window in all subsequent calls
screenshot({ window: "@w2" })
// Arrange window by preset
arrange_window({ window: "@w1", preset: "left-half" })
// Arrange window with explicit frame
arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })ts
// 列出指定应用的窗口
list_windows({ app: "Finder" })
// 在后续所有调用中指定目标窗口
screenshot({ window: "@w2" })
// 通过预设调整窗口布局
arrange_window({ window: "@w1", preset: "left-half" })
// 通过明确参数调整窗口布局
arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })Screenshot Modes
截图模式
Control when screenshots are attached with the option:
imagets
screenshot({ window: "@w1", image: "auto" }) // default: attach when AX coverage is weak
screenshot({ window: "@w1", image: "always" }) // always attach
screenshot({ window: "@w1", image: "never" }) // never attach, AX state only通过选项控制截图的附加时机:
imagets
screenshot({ window: "@w1", image: "auto" }) // 默认:AX覆盖不足时附加截图
screenshot({ window: "@w1", image: "always" }) // 始终附加截图
screenshot({ window: "@w1", image: "never" }) // 从不附加截图,仅返回AX状态Common Patterns
常见使用场景
Open URL in Safari
在Safari中打开URL
ts
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })
// @e1 = address bar (from AX state)
set_text({ ref: "@e1", text: "https://example.com" })
keypress({ keys: ["Enter"] })ts
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })
// @e1 = 地址栏(来自AX状态)
set_text({ ref: "@e1", text: "https://example.com" })
keypress({ keys: ["Enter"] })Fill a Form
填写表单
ts
screenshot({ window: "@w1" })
// Use refs from AX state
set_text({ ref: "@e3", text: "Jane Doe" })
set_text({ ref: "@e4", text: "jane@example.com" })
click({ ref: "@e5" }) // Submit buttonts
screenshot({ window: "@w1" })
// 使用AX状态返回的引用
set_text({ ref: "@e3", text: "Jane Doe" })
set_text({ ref: "@e4", text: "jane@example.com" })
click({ ref: "@e5" }) // 提交按钮Keyboard Shortcut
快捷键操作
ts
keypress({ keys: ["Cmd", "T"] }) // New tab
keypress({ keys: ["Cmd", "Shift", "N"] }) // New incognito window
keypress({ keys: ["Escape"] })ts
keypress({ keys: ["Cmd", "T"] }) // 新建标签页
keypress({ keys: ["Cmd", "Shift", "N"] }) // 新建隐私窗口
keypress({ keys: ["Escape"] })Scroll a Page
滚动页面
ts
scroll({ ref: "@e2", scrollY: 800 }) // Scroll element down
scroll({ ref: "@e2", scrollY: -400 }) // Scroll upts
scroll({ ref: "@e2", scrollY: 800 }) // 向下滚动元素
scroll({ ref: "@e2", scrollY: -400 }) // 向上滚动Drag and Drop
拖拽操作
ts
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })ts
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })Strict AX Mode (Stealth / Background-Safe)
严格AX模式(隐身/后台安全)
Enable strict AX mode to prevent focus changes, raw pointer events, raw keyboard events, and cursor takeover. All actions must succeed via background-safe AX paths:
ts
// Via config (see Configuration section)
// Actions will report `stealth` in execution metadata when successfulStrict mode errors will surface if an action requires foreground focus and strict mode is active.
启用严格AX模式可防止焦点切换、原始指针事件、原始键盘事件和光标抢占。所有操作必须通过后台安全的AX路径完成:
ts
// 通过配置实现(见配置章节)
// 操作成功时,执行元数据会标记为`stealth`若操作需要前台焦点但严格模式已激活,会触发严格模式错误。
Configuration
配置
Inspect effective config in Pi:
/computer-useConfig can be set via config files or environment variable overrides. Key options:
| Option | Description |
|---|---|
| |
| Enable background-safe strict AX mode |
| Browser-aware targeting preference |
See for full config file format and environment variable overrides.
docs/configuration.md在Pi中查看生效配置:
/computer-use可通过配置文件或环境变量覆盖配置。关键选项:
| 选项 | 描述 |
|---|---|
| |
| 启用后台安全的严格AX模式 |
| 浏览器感知的定位偏好 |
完整配置文件格式和环境变量覆盖规则请查看。
docs/configuration.mdDevelopment
开发
bash
undefinedbash
undefinedInstall dependencies
安装依赖
npm install
npm install
Run checks
运行检查
npm test
npm test
Run local checkout without loading installed copy
运行本地代码,不加载已安装版本
pi --no-extensions -e .
undefinedpi --no-extensions -e .
undefinedBenchmarks
基准测试
bash
undefinedbash
undefinedDefault QA benchmark
默认QA基准测试
npm run benchmark:qa
npm run benchmark:qa
Full benchmark (may open apps)
完整基准测试(可能会打开应用)
npm run benchmark:qa:full
See [`benchmarks/README.md`](https://github.com/injaneity/pi-computer-use/blob/main/benchmarks/README.md) for metrics, regression policy, and comparison workflow.
---npm run benchmark:qa:full
指标、回归策略和对比工作流请查看[`benchmarks/README.md`](https://github.com/injaneity/pi-computer-use/blob/main/benchmarks/README.md)。
---Troubleshooting
故障排查
Permissions not granted
未授予权限
Re-run and grant both Accessibility and Screen Recording to:
~/.pi/agent/helpers/pi-computer-use/bridgeOn macOS, go to System Settings → Privacy & Security → Accessibility and Screen Recording.
重新运行并为以下程序授予辅助功能和屏幕录制权限:
~/.pi/agent/helpers/pi-computer-use/bridge在macOS中,前往系统设置 → 隐私与安全性 → 辅助功能和屏幕录制进行设置。
AX refs are stale
AX引用过期
Take a fresh to get updated and new refs before acting. Stale-action detection uses to reject outdated coordinates or refs.
screenshotstateIdstateId执行新的获取更新后的和新引用,再进行操作。过期操作检测会通过拒绝过时的坐标或引用。
screenshotstateIdstateIdBrowser window not targeted correctly
浏览器窗口定位错误
Use (or Chrome/Firefox) first, then explicitly pass to and subsequent actions.
list_windows({ app: "Safari" })window: "@wN"screenshot先使用(或Chrome/Firefox),再将显式传递给和后续操作。
list_windows({ app: "Safari" })window: "@wN"screenshotStrict AX mode errors
严格AX模式错误
An action failed to complete via background-safe AX path. Either disable strict mode or identify an AX ref with / that supports the background path.
canPresscanSetValue操作无法通过后台安全的AX路径完成。可关闭严格模式,或找到带有/属性、支持后台路径的AX引用。
canPresscanSetValueHelper not found
辅助工具未找到
Ensure Pi installed the native helper:
bash
ls ~/.pi/agent/helpers/pi-computer-use/bridgeIf missing, reinstall:
pi install git:github.com/injaneity/pi-computer-use#v0.2.1确保Pi已安装原生辅助工具:
bash
ls ~/.pi/agent/helpers/pi-computer-use/bridge若缺失,重新安装:
pi install git:github.com/injaneity/pi-computer-use#v0.2.1Key Concepts
核心概念
- AX refs (,
@e1, …) — semantic element handles from macOS Accessibility API, stable within a state@e2 - Window refs (,
@w1, …) — stable handles from@w2list_windows - stateId — opaque ID from the latest screenshot; attach to coordinate-based actions to detect stale state
- stealth execution — action completed via AX without foregrounding the app or moving the real cursor
- semantic state — structured AX tree returned after every action, used instead of screenshots when coverage is sufficient
- AX引用(、
@e1…)—— 来自macOS辅助功能API的语义化元素句柄,在同一状态下保持稳定@e2 - 窗口引用(、
@w1…)—— 来自@w2的稳定句柄list_windows - stateId —— 最新截图返回的不透明ID;附加到基于坐标的操作中以检测过期状态
- 隐身执行 —— 通过AX完成操作,无需将应用前置或移动真实光标
- 语义化状态 —— 每次操作后返回的结构化AX树,在覆盖足够时替代截图使用