Loading...
Loading...
Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop applications (click buttons, fill forms, navigate menus, read UI state, toggle checkboxes, scroll, drag, type text, take screenshots, manage windows, use clipboard). Covers 50 commands across observation, interaction, keyboard/mouse, app lifecycle, clipboard, and wait. Triggers on: "click button", "fill form", "open app", "read UI", "automate desktop", "accessibility tree", "snapshot app", "type into field", "navigate menu", "toggle checkbox", "take screenshot", "desktop automation", "agent-desktop", or any desktop GUI interaction task. Supports macOS (Phase 1), with Windows and Linux planned.
npx skill4agent add lahfir/agent-desktop agent-desktopnpm install -g agent-desktop
# or
bun install -g --trust agent-desktop| Reference | Contents |
|---|---|
| snapshot, find, get, is, screenshot, list-surfaces — all flags, output examples |
| click, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command |
| launch, close, windows, clipboard, wait, batch, status, permissions, version |
| 12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns |
| macOS permissions/TCC, AX API internals, smart activation chain, surfaces, troubleshooting |
1. OBSERVE → agent-desktop snapshot --app "App Name" -i
2. REASON → Parse JSON, find target element by ref (@e1, @e2...)
3. ACT → agent-desktop click @e5 (or type, select, toggle...)
4. VERIFY → agent-desktop snapshot again to confirm state change
5. REPEAT → Continue until task is complete@e1@e2@e3snapshot{ "version": "1.0", "ok": true, "command": "snapshot", "data": { ... } }{ "version": "1.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }012| Code | Meaning | Recovery |
|---|---|---|
| Accessibility permission not granted | Grant in System Settings > Privacy > Accessibility |
| Ref not in current refmap | Re-run snapshot, use fresh ref |
| App not running | Launch it first |
| AX action rejected | Try alternative approach or coordinate-based click |
| Element can't do this | Use different command |
| Ref from old snapshot | Re-run snapshot |
| No matching window | Check app name, use list-windows |
| Wait condition not met | Increase --timeout |
| Bad arguments | Check command syntax |
agent-desktop snapshot --app "App" -i # Accessibility tree with refs
agent-desktop screenshot --app "App" out.png # PNG screenshot
agent-desktop find --app "App" --role button # Search elements
agent-desktop get @e1 --property text # Read element property
agent-desktop is @e1 --property enabled # Check element state
agent-desktop list-surfaces --app "App" # Available surfacesagent-desktop click @e5 # Click element
agent-desktop double-click @e3 # Double-click
agent-desktop triple-click @e2 # Triple-click (select line)
agent-desktop right-click @e5 # Right-click (context menu)
agent-desktop type @e2 "hello" # Type text into element
agent-desktop set-value @e2 "new value" # Set value directly
agent-desktop clear @e2 # Clear element value
agent-desktop focus @e2 # Set keyboard focus
agent-desktop select @e4 "Option B" # Select dropdown option
agent-desktop toggle @e6 # Toggle checkbox/switch
agent-desktop check @e6 # Idempotent check
agent-desktop uncheck @e6 # Idempotent uncheck
agent-desktop expand @e7 # Expand disclosure
agent-desktop collapse @e7 # Collapse disclosure
agent-desktop scroll @e1 --direction down # Scroll element
agent-desktop scroll-to @e8 # Scroll into viewagent-desktop press cmd+c # Key combo
agent-desktop press return --app "App" # Targeted key press
agent-desktop key-down shift # Hold key
agent-desktop key-up shift # Release key
agent-desktop hover @e5 # Cursor to element
agent-desktop hover --xy 500,300 # Cursor to coordinates
agent-desktop drag --from @e1 --to @e5 # Drag between elements
agent-desktop mouse-click --xy 500,300 # Click at coordinates
agent-desktop mouse-move --xy 100,200 # Move cursor
agent-desktop mouse-down --xy 100,200 # Press mouse button
agent-desktop mouse-up --xy 300,400 # Release mouse buttonagent-desktop launch "System Settings" # Launch and wait
agent-desktop close-app "TextEdit" # Quit gracefully
agent-desktop close-app "TextEdit" --force # Force kill
agent-desktop list-windows --app "Finder" # List windows
agent-desktop list-apps # List running GUI apps
agent-desktop focus-window --app "Finder" # Bring to front
agent-desktop resize-window --app "App" --width 800 --height 600
agent-desktop move-window --app "App" --x 0 --y 0
agent-desktop minimize --app "App"
agent-desktop maximize --app "App"
agent-desktop restore --app "App"agent-desktop clipboard-get # Read clipboard
agent-desktop clipboard-set "text" # Write to clipboard
agent-desktop clipboard-clear # Clear clipboardagent-desktop wait 1000 # Pause 1 second
agent-desktop wait --element @e5 --timeout 5000 # Wait for element
agent-desktop wait --window "Title" # Wait for window
agent-desktop wait --text "Done" --app "App" # Wait for text
agent-desktop wait --menu --app "App" # Wait for context menu
agent-desktop wait --menu-closed --app "App" # Wait for menu dismissalagent-desktop status # Health check
agent-desktop permissions # Check permission
agent-desktop permissions --request # Trigger permission dialog
agent-desktop version --json # Version info
agent-desktop batch '[...]' --stop-on-error # Batch commands-iclick @e5mouse-click --xy 500,300waitpermissionserror.codeerror.suggestionfindsnapshot --surface menu