agent-native

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

agent-native

agent-native

macOS native app automation via the Accessibility tree. Like agent-browser, but for desktop apps.
通过无障碍(Accessibility)树实现macOS原生应用自动化。类似agent-browser,但针对桌面应用。

Prerequisites

前置条件

  • macOS 13+ with Accessibility permissions granted to your terminal
  • Binary at
    agent-native
    (install via
    swift build -c release && cp .build/release/agent-native /usr/local/bin/
    )
  • macOS 13及以上系统,且已为终端授予无障碍(Accessibility)权限
  • 可执行文件
    agent-native
    (安装方式:
    swift build -c release && cp .build/release/agent-native /usr/local/bin/

Core Workflow

核心工作流

apps -> pick/open target -> snapshot -> interact by ref -> re-snapshot
  1. Always start with
    agent-native apps
    to see what's already running. Prefer reusing an already-open app (e.g. if a browser is already open, use it instead of opening a different one).
    • Known browsers: Safari, Arc, Chrome, Firefox, Helium. Any of these can be used for web tasks.
  2. Only call
    agent-native open <app>
    if the target app isn't already running.
  3. Snapshot, interact, and re-snapshot as needed.
bash
agent-native apps                            # Check what's already running
agent-native snapshot Safari -i              # Use the already-open browser
agent-native click @n5                       # Interact using refs
agent-native snapshot Safari -i              # Re-snapshot after UI changes
Always re-snapshot after actions that change the UI. Refs are invalidated when UI structure changes.
apps -> 选择/打开目标应用 -> 快照 -> 通过引用交互 -> 重新快照
  1. 始终从
    agent-native apps
    开始
    ,查看当前运行的应用。优先复用已打开的应用(例如,如果浏览器已打开,直接使用它而非打开新的实例)。
    • 支持的浏览器:Safari、Arc、Chrome、Firefox、Helium。这些浏览器均可用于网页任务。
  2. 仅当目标应用未运行时,才调用
    agent-native open <app>
  3. 根据需要进行快照、交互和重新快照。
bash
agent-native apps                            # 查看当前运行的应用
agent-native snapshot Safari -i              # 使用已打开的浏览器
agent-native click @n5                       # 通过引用进行交互
agent-native snapshot Safari -i              # UI变化后重新快照
在任何会改变UI的操作后,务必重新快照。当UI结构变化时,旧的引用会失效。

Commands

命令列表

Navigation

导航操作

bash
agent-native open <app>                          # Open/activate by name or bundle ID
agent-native open "System Settings"
agent-native open com.apple.Safari               # Already-running apps just activate
bash
agent-native open <app>                          # 通过名称或Bundle ID打开/激活应用
agent-native open "System Settings"
agent-native open com.apple.Safari               # 已运行的应用仅会被激活

Screenshot

截图

bash
agent-native screenshot <app> [path]             # Capture app's frontmost window
agent-native screenshot Slack                    # Saves to /tmp/agent-native-screenshot.png
agent-native screenshot Slack /tmp/slack.png     # Custom path
agent-native screenshot Slack --json             # {"path": "...", "width": ..., "height": ...}
bash
agent-native screenshot <app> [path]             # 捕获应用的前置窗口
agent-native screenshot Slack                    # 保存至/tmp/agent-native-screenshot.png
agent-native screenshot Slack /tmp/slack.png     # 自定义保存路径
agent-native screenshot Slack --json             # 返回格式:{"path": "...", "width": ..., "height": ...}

Keystroke Sending

发送按键

bash
agent-native key <app> <keys...>                 # Send keystrokes to an app
agent-native key Slack "Hello world"             # Type text
agent-native key Slack cmd+k                     # Send Cmd+K
agent-native key Slack escape                    # Special keys: escape, return, tab, delete, up, down, left, right, space
agent-native key Slack cmd+a delete              # Chain multiple keys
agent-native key Calculator 5 + 3 return         # Multiple keystrokes
Modifiers:
cmd
/
command
,
ctrl
/
control
,
alt
/
option
/
opt
,
shift
bash
agent-native key <app> <keys...>                 # 向应用发送按键
agent-native key Slack "Hello world"             # 输入文本
agent-native key Slack cmd+k                     # 发送Cmd+K组合键
agent-native key Slack escape                    # 特殊按键:escape、return、tab、delete、up、down、left、right、space
agent-native key Slack cmd+a delete              # 链式发送多个按键
agent-native key Calculator 5 + 3 return         # 发送多组按键
修饰键:
cmd
/
command
ctrl
/
control
alt
/
option
/
opt
shift

Paste File

粘贴文件

bash
agent-native paste <app> <path>                  # Copy file to clipboard and Cmd+V into app
agent-native paste Slack /tmp/screenshot.png     # Paste image into Slack
agent-native paste Slack ./report.pdf            # Paste any file
bash
agent-native paste <app> <path>                  # 将文件复制到剪贴板并执行Cmd+V粘贴至应用
agent-native paste Slack /tmp/screenshot.png     # 将图片粘贴到Slack
agent-native paste Slack ./report.pdf            # 粘贴任意类型文件

Snapshot (recommended for agents)

快照(推荐Agent使用)

bash
agent-native snapshot <app>                      # Full AX tree with refs
agent-native snapshot <app> -i                   # Interactive elements only (recommended)
agent-native snapshot <app> -i -c                # Interactive + compact
agent-native snapshot <app> -d 3                 # Limit depth
agent-native snapshot <app> -i --json            # JSON for parsing
FlagDescription
-i
Interactive elements only
-c
Compact -- remove empty structural elements
-d <n>
Limit tree depth
--json
JSON output
bash
agent-native snapshot <app>                      # 带引用的完整AX树
agent-native snapshot <app> -i                   # 仅显示可交互元素(推荐使用)
agent-native snapshot <app> -i -c                # 仅显示可交互元素并启用紧凑模式
agent-native snapshot <app> -d 3                 # 限制树的深度
agent-native snapshot <app> -i --json            # 输出JSON格式用于解析
参数说明
-i
仅显示可交互元素
-c
紧凑模式 -- 移除空的结构元素
-d <n>
限制树的深度
--json
输出JSON格式

Interaction (use @refs from snapshot)

交互操作(使用快照中的@引用)

bash
agent-native click @n2                           # Click / press
agent-native fill @n3 "text"                     # Clear field and type
agent-native type @n3 "text"                     # Type without clearing
agent-native select @n5 "Option A"               # Select from dropdown
agent-native check @n4                           # Check checkbox (idempotent)
agent-native uncheck @n4                         # Uncheck checkbox (idempotent)
agent-native focus @n3                           # Focus element
agent-native hover @n2                           # Move cursor to element
agent-native action @n7 AXIncrement              # Any AX action
bash
agent-native click @n2                           # 点击/按下元素
agent-native fill @n3 "text"                     # 清空输入框后输入文本
agent-native type @n3 "text"                     # 追加输入文本(不清空原有内容)
agent-native select @n5 "Option A"               # 从下拉菜单中选择选项
agent-native check @n4                           # 勾选复选框(幂等操作)
agent-native uncheck @n4                         # 取消勾选复选框(幂等操作)
agent-native focus @n3                           # 聚焦到元素
agent-native hover @n2                           # 将光标移动到元素上
agent-native action @n7 AXIncrement              # 执行任意AX操作

Read State

读取状态

bash
agent-native get text @n1                        # Get text / title / label
agent-native get value @n3                       # Get input value
agent-native get attr @n2 AXEnabled              # Get any AX attribute
agent-native get title Safari                    # Frontmost window title
agent-native is enabled @n5                      # true / false
agent-native is focused @n3                      # true / false
bash
agent-native get text @n1                        # 获取文本/标题/标签
agent-native get value @n3                       # 获取输入框的值
agent-native get attr @n2 AXEnabled              # 获取任意AX属性
agent-native get title Safari                    # 获取前置窗口的标题
agent-native is enabled @n5                      # 返回true/false
agent-native is focused @n3                      # 返回true/false

Discovery

发现功能

bash
agent-native apps                                # List running GUI apps
agent-native apps --format json                  # JSON output
agent-native find <app> --role AXButton          # Find elements by filter
agent-native find <app> --title "Submit"         # Find by title
agent-native inspect @n3                         # All attributes and actions
agent-native tree <app> --depth 3                # Raw AX tree (no refs)
bash
agent-native apps                                # 列出当前运行的GUI应用
agent-native apps --format json                  # 输出JSON格式
agent-native find <app> --role AXButton          # 通过筛选条件查找元素
agent-native find <app> --title "Submit"         # 通过标题查找元素
agent-native inspect @n3                         # 查看元素的所有属性和可执行操作
agent-native tree <app> --depth 3                # 原始AX树(无引用)

Wait

等待操作

bash
agent-native wait <app> --title "Apply" --timeout 5
agent-native wait <app> --role AXSheet --timeout 10
bash
agent-native wait <app> --title "Apply" --timeout 5
agent-native wait <app> --role AXSheet --timeout 10

Browser & Electron Enhanced Access

浏览器与Electron应用增强支持

Chromium browsers (Arc, Chrome, Edge, Brave, Vivaldi) and Electron apps don't expose web DOM content in the macOS AX tree by default.
snapshot
now auto-detects these apps and enhances access automatically.
Priority chain: CDP read → AX-enhanced interact → keyboard fallback → screenshot
Chromium系浏览器(Arc、Chrome、Edge、Brave、Vivaldi)和Electron应用默认不会在macOS AX树中暴露网页DOM内容。现在
snapshot
命令会自动检测这些应用并自动启用增强访问。
优先级顺序:CDP读取 → AX增强交互 → 键盘操作 fallback → 截图

Automatic Detection in
snapshot

snapshot
中的自动检测

bash
agent-native snapshot Arc -i                      # Auto-detects Chromium, enables AX enhancement
agent-native snapshot "VS Code" -i                # Auto-detects Electron, enables AX enhancement
When
snapshot
detects a Chromium/Electron app:
  1. CDP mode (richest): If browser was launched with
    --remote-debugging-port
    , snapshot reads the full web accessibility tree via Chrome DevTools Protocol
  2. AX-enhanced mode (fallback): Sets
    AXEnhancedUserInterface
    to force the app to build its accessibility tree, then walks it normally
bash
agent-native snapshot Arc -i                      # 自动检测Chromium,启用AX增强
agent-native snapshot "VS Code" -i                # 自动检测Electron,启用AX增强
snapshot
检测到Chromium/Electron应用时:
  1. CDP模式(功能最丰富):如果浏览器是通过
    --remote-debugging-port
    启动的,快照会通过Chrome DevTools Protocol读取完整的网页无障碍树
  2. AX增强模式(备选方案):设置
    AXEnhancedUserInterface
    强制应用构建无障碍树,然后正常遍历

CDP Mode (richest web content)

CDP模式(支持最完整的网页内容)

Launch your browser with CDP enabled for the best results:
bash
undefined
启动浏览器时开启CDP以获得最佳效果:
bash
undefined

Launch Chrome with CDP

启动开启CDP的Chrome

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

Snapshot auto-detects CDP on ports 9222/9229

Snapshot会自动检测9222/9229端口的CDP

agent-native snapshot Chrome -i
agent-native snapshot Chrome -i

Or specify a port explicitly

或者显式指定端口

agent-native snapshot Chrome -i --port 9222
undefined
agent-native snapshot Chrome -i --port 9222
undefined

Manual Persistent Control

手动持久化控制

bash
agent-native ax-enable Arc                        # Persistently enable AXEnhancedUserInterface
agent-native snapshot Arc -i                      # Now shows web page elements
agent-native ax-disable Arc                       # Restore when done
bash
agent-native ax-enable Arc                        # 持久化启用AXEnhancedUserInterface
agent-native snapshot Arc -i                      # 现在会显示网页元素
agent-native ax-disable Arc                       # 使用完成后恢复设置

Fallback: Keyboard Shortcuts + Screenshots

备选方案:快捷键 + 截图

When AX tree is still sparse (some Electron apps), fall back to keyboard-driven interaction:
bash
agent-native key Slack cmd+k                     # Open quick switcher
agent-native key Slack "channel name" return     # Type and confirm
agent-native screenshot Slack /tmp/slack.png     # Visual confirmation
agent-native get title Slack                     # Check navigation state
当AX树内容仍较少时(部分Electron应用),可 fallback 到键盘驱动的交互:
bash
agent-native key Slack cmd+k                     # 打开快速切换器
agent-native key Slack "channel name" return     # 输入频道名称并确认
agent-native screenshot Slack /tmp/slack.png     # 视觉确认
agent-native get title Slack                     # 检查导航状态

Pasting files into apps

向应用粘贴文件

bash
agent-native paste Slack /path/to/image.png      # Copies to clipboard and pastes in one step
bash
agent-native paste Slack /path/to/image.png      # 一步完成复制到剪贴板并粘贴的操作

Tips

小贴士

  • Always use
    -i
    with snapshot
    -- full trees are noisy.
  • Re-snapshot after navigation -- old refs may not resolve after UI changes.
  • check
    /
    uncheck
    are idempotent
    -- they read current state first.
  • fill
    clears first,
    type
    appends.
  • AX tree peers into browsers -- Safari/Chrome expose web content as AX nodes.
  • Electron apps need keyboard shortcuts -- use System Events keystrokes when AX is sparse.
  • 始终在snapshot中使用
    -i
    参数
    -- 完整的树内容过于繁杂。
  • 导航后重新快照 -- UI变化后旧的引用可能无法解析。
  • check
    /
    uncheck
    是幂等操作
    -- 它们会先读取当前状态。
  • fill
    会先清空内容,
    type
    是追加内容。
  • AX树可穿透浏览器 -- Safari/Chrome会将网页内容暴露为AX节点。
  • Electron应用需使用快捷键 -- 当AX内容较少时,使用系统事件按键操作。