agent-native
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseagent-native
agent-native
macOS native app automation via the Accessibility tree. Like agent-browser, but for desktop apps.
通过无障碍(Accessibility)树实现macOS原生应用自动化。类似agent-browser,但针对桌面应用。
Prerequisites
前置条件
- macOS 13+ with Accessibility permissions granted to your terminal
- Binary at (install via
agent-native)swift build -c release && cp .build/release/agent-native /usr/local/bin/
- macOS 13及以上系统,且已为终端授予无障碍(Accessibility)权限
- 可执行文件(安装方式:
agent-native)swift build -c release && cp .build/release/agent-native /usr/local/bin/
Core Workflow
核心工作流
apps -> pick/open target -> snapshot -> interact by ref -> re-snapshot
- Always start with to see what's already running. Prefer reusing an already-open app (e.g. if a browser is already open, use it instead of opening a different one).
agent-native apps- Known browsers: Safari, Arc, Chrome, Firefox, Helium. Any of these can be used for web tasks.
- Only call if the target app isn't already running.
agent-native open <app> - Snapshot, interact, and re-snapshot as needed.
bash
agent-native apps # Check what's already running
agent-native snapshot Safari -i # Use the already-open browser
agent-native click @n5 # Interact using refs
agent-native snapshot Safari -i # Re-snapshot after UI changesAlways re-snapshot after actions that change the UI. Refs are invalidated when UI structure changes.
apps -> 选择/打开目标应用 -> 快照 -> 通过引用交互 -> 重新快照
- 始终从开始,查看当前运行的应用。优先复用已打开的应用(例如,如果浏览器已打开,直接使用它而非打开新的实例)。
agent-native apps- 支持的浏览器:Safari、Arc、Chrome、Firefox、Helium。这些浏览器均可用于网页任务。
- 仅当目标应用未运行时,才调用。
agent-native open <app> - 根据需要进行快照、交互和重新快照。
bash
agent-native apps # 查看当前运行的应用
agent-native snapshot Safari -i # 使用已打开的浏览器
agent-native click @n5 # 通过引用进行交互
agent-native snapshot Safari -i # UI变化后重新快照在任何会改变UI的操作后,务必重新快照。当UI结构变化时,旧的引用会失效。
Commands
命令列表
Navigation
导航操作
bash
agent-native open <app> # Open/activate by name or bundle ID
agent-native open "System Settings"
agent-native open com.apple.Safari # Already-running apps just activatebash
agent-native open <app> # 通过名称或Bundle ID打开/激活应用
agent-native open "System Settings"
agent-native open com.apple.Safari # 已运行的应用仅会被激活Screenshot
截图
bash
agent-native screenshot <app> [path] # Capture app's frontmost window
agent-native screenshot Slack # Saves to /tmp/agent-native-screenshot.png
agent-native screenshot Slack /tmp/slack.png # Custom path
agent-native screenshot Slack --json # {"path": "...", "width": ..., "height": ...}bash
agent-native screenshot <app> [path] # 捕获应用的前置窗口
agent-native screenshot Slack # 保存至/tmp/agent-native-screenshot.png
agent-native screenshot Slack /tmp/slack.png # 自定义保存路径
agent-native screenshot Slack --json # 返回格式:{"path": "...", "width": ..., "height": ...}Keystroke Sending
发送按键
bash
agent-native key <app> <keys...> # Send keystrokes to an app
agent-native key Slack "Hello world" # Type text
agent-native key Slack cmd+k # Send Cmd+K
agent-native key Slack escape # Special keys: escape, return, tab, delete, up, down, left, right, space
agent-native key Slack cmd+a delete # Chain multiple keys
agent-native key Calculator 5 + 3 return # Multiple keystrokesModifiers: /, /, //,
cmdcommandctrlcontrolaltoptionoptshiftbash
agent-native key <app> <keys...> # 向应用发送按键
agent-native key Slack "Hello world" # 输入文本
agent-native key Slack cmd+k # 发送Cmd+K组合键
agent-native key Slack escape # 特殊按键:escape、return、tab、delete、up、down、left、right、space
agent-native key Slack cmd+a delete # 链式发送多个按键
agent-native key Calculator 5 + 3 return # 发送多组按键修饰键:/、/、//、
cmdcommandctrlcontrolaltoptionoptshiftPaste File
粘贴文件
bash
agent-native paste <app> <path> # Copy file to clipboard and Cmd+V into app
agent-native paste Slack /tmp/screenshot.png # Paste image into Slack
agent-native paste Slack ./report.pdf # Paste any filebash
agent-native paste <app> <path> # 将文件复制到剪贴板并执行Cmd+V粘贴至应用
agent-native paste Slack /tmp/screenshot.png # 将图片粘贴到Slack
agent-native paste Slack ./report.pdf # 粘贴任意类型文件Snapshot (recommended for agents)
快照(推荐Agent使用)
bash
agent-native snapshot <app> # Full AX tree with refs
agent-native snapshot <app> -i # Interactive elements only (recommended)
agent-native snapshot <app> -i -c # Interactive + compact
agent-native snapshot <app> -d 3 # Limit depth
agent-native snapshot <app> -i --json # JSON for parsing| Flag | Description |
|---|---|
| Interactive elements only |
| Compact -- remove empty structural elements |
| Limit tree depth |
| JSON output |
bash
agent-native snapshot <app> # 带引用的完整AX树
agent-native snapshot <app> -i # 仅显示可交互元素(推荐使用)
agent-native snapshot <app> -i -c # 仅显示可交互元素并启用紧凑模式
agent-native snapshot <app> -d 3 # 限制树的深度
agent-native snapshot <app> -i --json # 输出JSON格式用于解析| 参数 | 说明 |
|---|---|
| 仅显示可交互元素 |
| 紧凑模式 -- 移除空的结构元素 |
| 限制树的深度 |
| 输出JSON格式 |
Interaction (use @refs from snapshot)
交互操作(使用快照中的@引用)
bash
agent-native click @n2 # Click / press
agent-native fill @n3 "text" # Clear field and type
agent-native type @n3 "text" # Type without clearing
agent-native select @n5 "Option A" # Select from dropdown
agent-native check @n4 # Check checkbox (idempotent)
agent-native uncheck @n4 # Uncheck checkbox (idempotent)
agent-native focus @n3 # Focus element
agent-native hover @n2 # Move cursor to element
agent-native action @n7 AXIncrement # Any AX actionbash
agent-native click @n2 # 点击/按下元素
agent-native fill @n3 "text" # 清空输入框后输入文本
agent-native type @n3 "text" # 追加输入文本(不清空原有内容)
agent-native select @n5 "Option A" # 从下拉菜单中选择选项
agent-native check @n4 # 勾选复选框(幂等操作)
agent-native uncheck @n4 # 取消勾选复选框(幂等操作)
agent-native focus @n3 # 聚焦到元素
agent-native hover @n2 # 将光标移动到元素上
agent-native action @n7 AXIncrement # 执行任意AX操作Read State
读取状态
bash
agent-native get text @n1 # Get text / title / label
agent-native get value @n3 # Get input value
agent-native get attr @n2 AXEnabled # Get any AX attribute
agent-native get title Safari # Frontmost window title
agent-native is enabled @n5 # true / false
agent-native is focused @n3 # true / falsebash
agent-native get text @n1 # 获取文本/标题/标签
agent-native get value @n3 # 获取输入框的值
agent-native get attr @n2 AXEnabled # 获取任意AX属性
agent-native get title Safari # 获取前置窗口的标题
agent-native is enabled @n5 # 返回true/false
agent-native is focused @n3 # 返回true/falseDiscovery
发现功能
bash
agent-native apps # List running GUI apps
agent-native apps --format json # JSON output
agent-native find <app> --role AXButton # Find elements by filter
agent-native find <app> --title "Submit" # Find by title
agent-native inspect @n3 # All attributes and actions
agent-native tree <app> --depth 3 # Raw AX tree (no refs)bash
agent-native apps # 列出当前运行的GUI应用
agent-native apps --format json # 输出JSON格式
agent-native find <app> --role AXButton # 通过筛选条件查找元素
agent-native find <app> --title "Submit" # 通过标题查找元素
agent-native inspect @n3 # 查看元素的所有属性和可执行操作
agent-native tree <app> --depth 3 # 原始AX树(无引用)Wait
等待操作
bash
agent-native wait <app> --title "Apply" --timeout 5
agent-native wait <app> --role AXSheet --timeout 10bash
agent-native wait <app> --title "Apply" --timeout 5
agent-native wait <app> --role AXSheet --timeout 10Browser & Electron Enhanced Access
浏览器与Electron应用增强支持
Chromium browsers (Arc, Chrome, Edge, Brave, Vivaldi) and Electron apps don't expose web DOM content in the macOS AX tree by default. now auto-detects these apps and enhances access automatically.
snapshotPriority chain: CDP read → AX-enhanced interact → keyboard fallback → screenshot
Chromium系浏览器(Arc、Chrome、Edge、Brave、Vivaldi)和Electron应用默认不会在macOS AX树中暴露网页DOM内容。现在命令会自动检测这些应用并自动启用增强访问。
snapshot优先级顺序:CDP读取 → AX增强交互 → 键盘操作 fallback → 截图
Automatic Detection in snapshot
snapshotsnapshot
中的自动检测
snapshotbash
agent-native snapshot Arc -i # Auto-detects Chromium, enables AX enhancement
agent-native snapshot "VS Code" -i # Auto-detects Electron, enables AX enhancementWhen detects a Chromium/Electron app:
snapshot- CDP mode (richest): If browser was launched with , snapshot reads the full web accessibility tree via Chrome DevTools Protocol
--remote-debugging-port - AX-enhanced mode (fallback): Sets to force the app to build its accessibility tree, then walks it normally
AXEnhancedUserInterface
bash
agent-native snapshot Arc -i # 自动检测Chromium,启用AX增强
agent-native snapshot "VS Code" -i # 自动检测Electron,启用AX增强当检测到Chromium/Electron应用时:
snapshot- CDP模式(功能最丰富):如果浏览器是通过启动的,快照会通过Chrome DevTools Protocol读取完整的网页无障碍树
--remote-debugging-port - AX增强模式(备选方案):设置强制应用构建无障碍树,然后正常遍历
AXEnhancedUserInterface
CDP Mode (richest web content)
CDP模式(支持最完整的网页内容)
Launch your browser with CDP enabled for the best results:
bash
undefined启动浏览器时开启CDP以获得最佳效果:
bash
undefinedLaunch Chrome with CDP
启动开启CDP的Chrome
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
Snapshot auto-detects CDP on ports 9222/9229
Snapshot会自动检测9222/9229端口的CDP
agent-native snapshot Chrome -i
agent-native snapshot Chrome -i
Or specify a port explicitly
或者显式指定端口
agent-native snapshot Chrome -i --port 9222
undefinedagent-native snapshot Chrome -i --port 9222
undefinedManual Persistent Control
手动持久化控制
bash
agent-native ax-enable Arc # Persistently enable AXEnhancedUserInterface
agent-native snapshot Arc -i # Now shows web page elements
agent-native ax-disable Arc # Restore when donebash
agent-native ax-enable Arc # 持久化启用AXEnhancedUserInterface
agent-native snapshot Arc -i # 现在会显示网页元素
agent-native ax-disable Arc # 使用完成后恢复设置Fallback: Keyboard Shortcuts + Screenshots
备选方案:快捷键 + 截图
When AX tree is still sparse (some Electron apps), fall back to keyboard-driven interaction:
bash
agent-native key Slack cmd+k # Open quick switcher
agent-native key Slack "channel name" return # Type and confirm
agent-native screenshot Slack /tmp/slack.png # Visual confirmation
agent-native get title Slack # Check navigation state当AX树内容仍较少时(部分Electron应用),可 fallback 到键盘驱动的交互:
bash
agent-native key Slack cmd+k # 打开快速切换器
agent-native key Slack "channel name" return # 输入频道名称并确认
agent-native screenshot Slack /tmp/slack.png # 视觉确认
agent-native get title Slack # 检查导航状态Pasting files into apps
向应用粘贴文件
bash
agent-native paste Slack /path/to/image.png # Copies to clipboard and pastes in one stepbash
agent-native paste Slack /path/to/image.png # 一步完成复制到剪贴板并粘贴的操作Tips
小贴士
- Always use with snapshot -- full trees are noisy.
-i - Re-snapshot after navigation -- old refs may not resolve after UI changes.
- /
checkare idempotent -- they read current state first.uncheck - clears first,
fillappends.type - AX tree peers into browsers -- Safari/Chrome expose web content as AX nodes.
- Electron apps need keyboard shortcuts -- use System Events keystrokes when AX is sparse.
- 始终在snapshot中使用参数 -- 完整的树内容过于繁杂。
-i - 导航后重新快照 -- UI变化后旧的引用可能无法解析。
- /
check是幂等操作 -- 它们会先读取当前状态。uncheck - 会先清空内容,
fill是追加内容。type - AX树可穿透浏览器 -- Safari/Chrome会将网页内容暴露为AX节点。
- Electron应用需使用快捷键 -- 当AX内容较少时,使用系统事件按键操作。