pi-computer-use
Original:🇺🇸 English
Translated
Control macOS applications with Pi agents using semantic Accessibility API targets and optional screenshots
6installs
Sourcearadotso/trending-skills
Added on
NPX Install
npx skill4agent add aradotso/trending-skills pi-computer-useTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →pi-computer-use
Skill by ara.so — Daily 2026 Skills collection.
pi-computer-use@e1Installation
Via Pi (recommended)
bash
pi install git:github.com/injaneity/pi-computer-use#v0.2.1Pin to a specific version:
bash
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1Via npm
bash
npm install @injaneity/pi-computer-use
# or pin a version
npm install @injaneity/pi-computer-use@0.2.1Remove
bash
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1
npm remove @injaneity/pi-computer-useFirst-Run Permissions
On first session, macOS will prompt for permissions for:
~/.pi/agent/helpers/pi-computer-use/bridgeGrant both:
- Accessibility — required for AX ref targeting
- Screen Recording — required for screenshots
How It Works
Three components:
- Pi extension () — registers public tools and
extensions/computer-use.tscommand/computer-use - TypeScript bridge () — manages window state, AX refs, fallback policy, batching, execution metadata
src/bridge.ts - Native Swift helper () — talks to macOS Accessibility, ScreenCaptureKit, AppKit, CoreGraphics
native/macos/bridge.swift
Available Tools
| Tool | Purpose |
|---|---|
| List running apps |
| List windows for an app |
| Capture window + return AX state |
| Click element or coordinate |
| Double-click element or coordinate |
| Move cursor |
| Drag from point to point |
| Scroll element or coordinate |
| Press key combination |
| Type raw text |
| Replace element value via AX |
| Pause execution |
| Position/resize window |
| Batch multiple actions |
Core Workflow
Always start a session with to select the controlled window and obtain AX refs:
screenshotts
// 1. Discover apps and windows if target is ambiguous
list_apps()
list_windows({ app: "Safari" })
// 2. Select the window and get AX state
screenshot({ window: "@w1" })
// 3. Act on AX refs returned from screenshot
click({ window: "@w1", ref: "@e1" })
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })AX Ref Targeting (Preferred)
AX refs like , are returned by and carry capability metadata:
@e1@e2screenshot- — supports
canSetValueset_text - — supports
canPressclick - — can receive focus
canFocus - — supports
canScrollscroll - — supports value adjustment
adjust
ts
// Click by AX ref — no coordinates needed
click({ ref: "@e1" })
// Scroll a specific element
scroll({ ref: "@e3", scrollY: 600 })
// Replace text field value atomically
set_text({ ref: "@e2", text: "hello world" })Coordinate Fallback
Use coordinates only when no suitable AX target exists. Always include from the latest screenshot to guard against stale state:
stateIdts
click({ x: 320, y: 180, stateId: "abc123" })Batching Actions
Use to batch obvious sequential steps. One semantic state update is returned after all actions:
computer_actionsts
computer_actions({
stateId: "abc123",
actions: [
{ type: "click", ref: "@e1" },
{ type: "set_text", ref: "@e2", text: "https://example.com" },
{ type: "keypress", keys: ["Enter"] }
]
})Each action in the result includes execution metadata:
- — background-safe AX path (no focus takeover)
stealth - — required focus or raw event fallback
default
Window Management
ts
// List windows for a specific app
list_windows({ app: "Finder" })
// Target a specific window in all subsequent calls
screenshot({ window: "@w2" })
// Arrange window by preset
arrange_window({ window: "@w1", preset: "left-half" })
// Arrange window with explicit frame
arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })Screenshot Modes
Control when screenshots are attached with the option:
imagets
screenshot({ window: "@w1", image: "auto" }) // default: attach when AX coverage is weak
screenshot({ window: "@w1", image: "always" }) // always attach
screenshot({ window: "@w1", image: "never" }) // never attach, AX state onlyCommon Patterns
Open URL in Safari
ts
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })
// @e1 = address bar (from AX state)
set_text({ ref: "@e1", text: "https://example.com" })
keypress({ keys: ["Enter"] })Fill a Form
ts
screenshot({ window: "@w1" })
// Use refs from AX state
set_text({ ref: "@e3", text: "Jane Doe" })
set_text({ ref: "@e4", text: "jane@example.com" })
click({ ref: "@e5" }) // Submit buttonKeyboard Shortcut
ts
keypress({ keys: ["Cmd", "T"] }) // New tab
keypress({ keys: ["Cmd", "Shift", "N"] }) // New incognito window
keypress({ keys: ["Escape"] })Scroll a Page
ts
scroll({ ref: "@e2", scrollY: 800 }) // Scroll element down
scroll({ ref: "@e2", scrollY: -400 }) // Scroll upDrag and Drop
ts
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })Strict AX Mode (Stealth / Background-Safe)
Enable strict AX mode to prevent focus changes, raw pointer events, raw keyboard events, and cursor takeover. All actions must succeed via background-safe AX paths:
ts
// Via config (see Configuration section)
// Actions will report `stealth` in execution metadata when successfulStrict mode errors will surface if an action requires foreground focus and strict mode is active.
Configuration
Inspect effective config in Pi:
/computer-useConfig can be set via config files or environment variable overrides. Key options:
| Option | Description |
|---|---|
| |
| Enable background-safe strict AX mode |
| Browser-aware targeting preference |
See for full config file format and environment variable overrides.
docs/configuration.mdDevelopment
bash
# Install dependencies
npm install
# Run checks
npm test
# Run local checkout without loading installed copy
pi --no-extensions -e .Benchmarks
bash
# Default QA benchmark
npm run benchmark:qa
# Full benchmark (may open apps)
npm run benchmark:qa:fullSee for metrics, regression policy, and comparison workflow.
benchmarks/README.mdTroubleshooting
Permissions not granted
Re-run and grant both Accessibility and Screen Recording to:
~/.pi/agent/helpers/pi-computer-use/bridgeOn macOS, go to System Settings → Privacy & Security → Accessibility and Screen Recording.
AX refs are stale
Take a fresh to get updated and new refs before acting. Stale-action detection uses to reject outdated coordinates or refs.
screenshotstateIdstateIdBrowser window not targeted correctly
Use (or Chrome/Firefox) first, then explicitly pass to and subsequent actions.
list_windows({ app: "Safari" })window: "@wN"screenshotStrict AX mode errors
An action failed to complete via background-safe AX path. Either disable strict mode or identify an AX ref with / that supports the background path.
canPresscanSetValueHelper not found
Ensure Pi installed the native helper:
bash
ls ~/.pi/agent/helpers/pi-computer-use/bridgeIf missing, reinstall:
pi install git:github.com/injaneity/pi-computer-use#v0.2.1Key Concepts
- AX refs (,
@e1, …) — semantic element handles from macOS Accessibility API, stable within a state@e2 - Window refs (,
@w1, …) — stable handles from@w2list_windows - stateId — opaque ID from the latest screenshot; attach to coordinate-based actions to detect stale state
- stealth execution — action completed via AX without foregrounding the app or moving the real cursor
- semantic state — structured AX tree returned after every action, used instead of screenshots when coverage is sufficient