pi-computer-use

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

pi-computer-use

pi-computer-use

Skill by ara.so — Daily 2026 Skills collection.
pi-computer-use
gives Pi agents a semantic computer-use surface for visible macOS windows. It prefers Accessibility (AX) targets (like
@e1
) over raw coordinates, returns semantic state after every action, and attaches screenshots only when AX coverage is too weak.

ara.so开发的Skill——Daily 2026 Skills合集。
pi-computer-use
为Pi Agent提供了针对可见macOS窗口的语义化计算机操作界面。相比原始坐标,它优先使用辅助功能(AX)目标(如
@e1
),每次操作后返回语义化状态,仅在AX覆盖不足时附加截图。

Installation

安装

Via Pi (recommended)

通过Pi(推荐)

bash
pi install git:github.com/injaneity/pi-computer-use#v0.2.1
Pin to a specific version:
bash
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1
bash
pi install git:github.com/injaneity/pi-computer-use#v0.2.1
固定到特定版本:
bash
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1

Via npm

通过npm

bash
npm install @injaneity/pi-computer-use
bash
npm install @injaneity/pi-computer-use

or pin a version

或固定版本

npm install @injaneity/pi-computer-use@0.2.1
undefined
npm install @injaneity/pi-computer-use@0.2.1
undefined

Remove

卸载

bash
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1
npm remove @injaneity/pi-computer-use

bash
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1
npm remove @injaneity/pi-computer-use

First-Run Permissions

首次运行权限

On first session, macOS will prompt for permissions for:
~/.pi/agent/helpers/pi-computer-use/bridge
Grant both:
  • Accessibility — required for AX ref targeting
  • Screen Recording — required for screenshots

首次启动会话时,macOS会提示为以下程序授予权限:
~/.pi/agent/helpers/pi-computer-use/bridge
需同时授予:
  • 辅助功能 —— 用于AX引用定位
  • 屏幕录制 —— 用于截图

How It Works

工作原理

Three components:
  1. Pi extension (
    extensions/computer-use.ts
    ) — registers public tools and
    /computer-use
    command
  2. TypeScript bridge (
    src/bridge.ts
    ) — manages window state, AX refs, fallback policy, batching, execution metadata
  3. Native Swift helper (
    native/macos/bridge.swift
    ) — talks to macOS Accessibility, ScreenCaptureKit, AppKit, CoreGraphics

包含三个组件:
  1. Pi扩展
    extensions/computer-use.ts
    )—— 注册公共工具和
    /computer-use
    命令
  2. TypeScript桥接层
    src/bridge.ts
    )—— 管理窗口状态、AX引用、回退策略、批量处理、执行元数据
  3. 原生Swift辅助工具
    native/macos/bridge.swift
    )—— 对接macOS Accessibility、ScreenCaptureKit、AppKit、CoreGraphics

Available Tools

可用工具

ToolPurpose
list_apps
List running apps
list_windows
List windows for an app
screenshot
Capture window + return AX state
click
Click element or coordinate
double_click
Double-click element or coordinate
move_mouse
Move cursor
drag
Drag from point to point
scroll
Scroll element or coordinate
keypress
Press key combination
type_text
Type raw text
set_text
Replace element value via AX
wait
Pause execution
arrange_window
Position/resize window
computer_actions
Batch multiple actions

工具用途
list_apps
列出运行中的应用
list_windows
列出指定应用的窗口
screenshot
捕获窗口并返回AX状态
click
点击元素或坐标
double_click
双击元素或坐标
move_mouse
移动光标
drag
从某点拖拽到另一点
scroll
滚动元素或指定坐标区域
keypress
按下组合键
type_text
输入纯文本
set_text
通过AX替换元素内容
wait
暂停执行
arrange_window
调整窗口位置/大小
computer_actions
批量执行多个操作

Core Workflow

核心工作流程

Always start a session with
screenshot
to select the controlled window and obtain AX refs:
ts
// 1. Discover apps and windows if target is ambiguous
list_apps()
list_windows({ app: "Safari" })

// 2. Select the window and get AX state
screenshot({ window: "@w1" })

// 3. Act on AX refs returned from screenshot
click({ window: "@w1", ref: "@e1" })
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })

会话始终以
screenshot
开始,用于选择受控窗口并获取AX引用:
ts
// 1. 若目标不明确,先发现应用和窗口
list_apps()
list_windows({ app: "Safari" })

// 2. 选择窗口并获取AX状态
screenshot({ window: "@w1" })

// 3. 基于screenshot返回的AX引用执行操作
click({ window: "@w1", ref: "@e1" })
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })

AX Ref Targeting (Preferred)

AX引用定位(优先推荐)

AX refs like
@e1
,
@e2
are returned by
screenshot
and carry capability metadata:
  • canSetValue
    — supports
    set_text
  • canPress
    — supports
    click
  • canFocus
    — can receive focus
  • canScroll
    — supports
    scroll
  • adjust
    — supports value adjustment
ts
// Click by AX ref — no coordinates needed
click({ ref: "@e1" })

// Scroll a specific element
scroll({ ref: "@e3", scrollY: 600 })

// Replace text field value atomically
set_text({ ref: "@e2", text: "hello world" })

screenshot
返回的AX引用(如
@e1
@e2
)包含能力元数据:
  • canSetValue
    —— 支持
    set_text
    操作
  • canPress
    —— 支持
    click
    操作
  • canFocus
    —— 可获取焦点
  • canScroll
    —— 支持
    scroll
    操作
  • adjust
    —— 支持值调整
ts
// 通过AX引用点击——无需坐标
click({ ref: "@e1" })

// 滚动指定元素
scroll({ ref: "@e3", scrollY: 600 })

// 原子性替换文本框内容
set_text({ ref: "@e2", text: "hello world" })

Coordinate Fallback

坐标回退方案

Use coordinates only when no suitable AX target exists. Always include
stateId
from the latest screenshot to guard against stale state:
ts
click({ x: 320, y: 180, stateId: "abc123" })

仅当无合适AX目标时才使用坐标。务必包含最新截图返回的
stateId
,以避免状态过期:
ts
click({ x: 320, y: 180, stateId: "abc123" })

Batching Actions

批量操作

Use
computer_actions
to batch obvious sequential steps. One semantic state update is returned after all actions:
ts
computer_actions({
  stateId: "abc123",
  actions: [
    { type: "click", ref: "@e1" },
    { type: "set_text", ref: "@e2", text: "https://example.com" },
    { type: "keypress", keys: ["Enter"] }
  ]
})
Each action in the result includes execution metadata:
  • stealth
    — background-safe AX path (no focus takeover)
  • default
    — required focus or raw event fallback

使用
computer_actions
批量执行明显的连续步骤。所有操作完成后返回一次语义化状态更新:
ts
computer_actions({
  stateId: "abc123",
  actions: [
    { type: "click", ref: "@e1" },
    { type: "set_text", ref: "@e2", text: "https://example.com" },
    { type: "keypress", keys: ["Enter"] }
  ]
})
结果中的每个操作都包含执行元数据:
  • stealth
    —— 后台安全的AX路径(不会抢占焦点)
  • default
    —— 需要焦点或原始事件回退

Window Management

窗口管理

ts
// List windows for a specific app
list_windows({ app: "Finder" })

// Target a specific window in all subsequent calls
screenshot({ window: "@w2" })

// Arrange window by preset
arrange_window({ window: "@w1", preset: "left-half" })

// Arrange window with explicit frame
arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })

ts
// 列出指定应用的窗口
list_windows({ app: "Finder" })

// 在后续所有调用中指定目标窗口
screenshot({ window: "@w2" })

// 通过预设调整窗口布局
arrange_window({ window: "@w1", preset: "left-half" })

// 通过明确参数调整窗口布局
arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })

Screenshot Modes

截图模式

Control when screenshots are attached with the
image
option:
ts
screenshot({ window: "@w1", image: "auto" })   // default: attach when AX coverage is weak
screenshot({ window: "@w1", image: "always" }) // always attach
screenshot({ window: "@w1", image: "never" })  // never attach, AX state only

通过
image
选项控制截图的附加时机:
ts
screenshot({ window: "@w1", image: "auto" })   // 默认:AX覆盖不足时附加截图
screenshot({ window: "@w1", image: "always" }) // 始终附加截图
screenshot({ window: "@w1", image: "never" })  // 从不附加截图,仅返回AX状态

Common Patterns

常见使用场景

Open URL in Safari

在Safari中打开URL

ts
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })
// @e1 = address bar (from AX state)
set_text({ ref: "@e1", text: "https://example.com" })
keypress({ keys: ["Enter"] })
ts
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })
// @e1 = 地址栏(来自AX状态)
set_text({ ref: "@e1", text: "https://example.com" })
keypress({ keys: ["Enter"] })

Fill a Form

填写表单

ts
screenshot({ window: "@w1" })
// Use refs from AX state
set_text({ ref: "@e3", text: "Jane Doe" })
set_text({ ref: "@e4", text: "jane@example.com" })
click({ ref: "@e5" }) // Submit button
ts
screenshot({ window: "@w1" })
// 使用AX状态返回的引用
set_text({ ref: "@e3", text: "Jane Doe" })
set_text({ ref: "@e4", text: "jane@example.com" })
click({ ref: "@e5" }) // 提交按钮

Keyboard Shortcut

快捷键操作

ts
keypress({ keys: ["Cmd", "T"] })       // New tab
keypress({ keys: ["Cmd", "Shift", "N"] }) // New incognito window
keypress({ keys: ["Escape"] })
ts
keypress({ keys: ["Cmd", "T"] })       // 新建标签页
keypress({ keys: ["Cmd", "Shift", "N"] }) // 新建隐私窗口
keypress({ keys: ["Escape"] })

Scroll a Page

滚动页面

ts
scroll({ ref: "@e2", scrollY: 800 })   // Scroll element down
scroll({ ref: "@e2", scrollY: -400 })  // Scroll up
ts
scroll({ ref: "@e2", scrollY: 800 })   // 向下滚动元素
scroll({ ref: "@e2", scrollY: -400 })  // 向上滚动

Drag and Drop

拖拽操作

ts
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })

ts
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })

Strict AX Mode (Stealth / Background-Safe)

严格AX模式(隐身/后台安全)

Enable strict AX mode to prevent focus changes, raw pointer events, raw keyboard events, and cursor takeover. All actions must succeed via background-safe AX paths:
ts
// Via config (see Configuration section)
// Actions will report `stealth` in execution metadata when successful
Strict mode errors will surface if an action requires foreground focus and strict mode is active.

启用严格AX模式可防止焦点切换、原始指针事件、原始键盘事件和光标抢占。所有操作必须通过后台安全的AX路径完成:
ts
// 通过配置实现(见配置章节)
// 操作成功时,执行元数据会标记为`stealth`
若操作需要前台焦点但严格模式已激活,会触发严格模式错误。

Configuration

配置

Inspect effective config in Pi:
/computer-use
Config can be set via config files or environment variable overrides. Key options:
OptionDescription
image
"auto"
|
"always"
|
"never"
— screenshot attachment mode
strictAX
Enable background-safe strict AX mode
browser
Browser-aware targeting preference
See
docs/configuration.md
for full config file format and environment variable overrides.

在Pi中查看生效配置:
/computer-use
可通过配置文件或环境变量覆盖配置。关键选项:
选项描述
image
"auto"
|
"always"
|
"never"
—— 截图附加模式
strictAX
启用后台安全的严格AX模式
browser
浏览器感知的定位偏好
完整配置文件格式和环境变量覆盖规则请查看
docs/configuration.md

Development

开发

bash
undefined
bash
undefined

Install dependencies

安装依赖

npm install
npm install

Run checks

运行检查

npm test
npm test

Run local checkout without loading installed copy

运行本地代码,不加载已安装版本

pi --no-extensions -e .
undefined
pi --no-extensions -e .
undefined

Benchmarks

基准测试

bash
undefined
bash
undefined

Default QA benchmark

默认QA基准测试

npm run benchmark:qa
npm run benchmark:qa

Full benchmark (may open apps)

完整基准测试(可能会打开应用)

npm run benchmark:qa:full

See [`benchmarks/README.md`](https://github.com/injaneity/pi-computer-use/blob/main/benchmarks/README.md) for metrics, regression policy, and comparison workflow.

---
npm run benchmark:qa:full

指标、回归策略和对比工作流请查看[`benchmarks/README.md`](https://github.com/injaneity/pi-computer-use/blob/main/benchmarks/README.md)。

---

Troubleshooting

故障排查

Permissions not granted

未授予权限

Re-run and grant both Accessibility and Screen Recording to:
~/.pi/agent/helpers/pi-computer-use/bridge
On macOS, go to System Settings → Privacy & Security → Accessibility and Screen Recording.
重新运行并为以下程序授予辅助功能和屏幕录制权限:
~/.pi/agent/helpers/pi-computer-use/bridge
在macOS中,前往系统设置 → 隐私与安全性 → 辅助功能屏幕录制进行设置。

AX refs are stale

AX引用过期

Take a fresh
screenshot
to get updated
stateId
and new refs before acting. Stale-action detection uses
stateId
to reject outdated coordinates or refs.
执行新的
screenshot
获取更新后的
stateId
和新引用,再进行操作。过期操作检测会通过
stateId
拒绝过时的坐标或引用。

Browser window not targeted correctly

浏览器窗口定位错误

Use
list_windows({ app: "Safari" })
(or Chrome/Firefox) first, then explicitly pass
window: "@wN"
to
screenshot
and subsequent actions.
先使用
list_windows({ app: "Safari" })
(或Chrome/Firefox),再将
window: "@wN"
显式传递给
screenshot
和后续操作。

Strict AX mode errors

严格AX模式错误

An action failed to complete via background-safe AX path. Either disable strict mode or identify an AX ref with
canPress
/
canSetValue
that supports the background path.
操作无法通过后台安全的AX路径完成。可关闭严格模式,或找到带有
canPress
/
canSetValue
属性、支持后台路径的AX引用。

Helper not found

辅助工具未找到

Ensure Pi installed the native helper:
bash
ls ~/.pi/agent/helpers/pi-computer-use/bridge
If missing, reinstall:
pi install git:github.com/injaneity/pi-computer-use#v0.2.1

确保Pi已安装原生辅助工具:
bash
ls ~/.pi/agent/helpers/pi-computer-use/bridge
若缺失,重新安装:
pi install git:github.com/injaneity/pi-computer-use#v0.2.1

Key Concepts

核心概念

  • AX refs (
    @e1
    ,
    @e2
    , …) — semantic element handles from macOS Accessibility API, stable within a state
  • Window refs (
    @w1
    ,
    @w2
    , …) — stable handles from
    list_windows
  • stateId — opaque ID from the latest screenshot; attach to coordinate-based actions to detect stale state
  • stealth execution — action completed via AX without foregrounding the app or moving the real cursor
  • semantic state — structured AX tree returned after every action, used instead of screenshots when coverage is sufficient

  • AX引用
    @e1
    @e2
    …)—— 来自macOS辅助功能API的语义化元素句柄,在同一状态下保持稳定
  • 窗口引用
    @w1
    @w2
    …)—— 来自
    list_windows
    的稳定句柄
  • stateId —— 最新截图返回的不透明ID;附加到基于坐标的操作中以检测过期状态
  • 隐身执行 —— 通过AX完成操作,无需将应用前置或移动真实光标
  • 语义化状态 —— 每次操作后返回的结构化AX树,在覆盖足够时替代截图使用

References

参考资料