agent-tui

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

🚨 CRITICAL: macOS Daemon Workaround & Gemini CLI Usage 🚨

🚨 重要提示:macOS守护进程解决方案与Gemini CLI使用说明 🚨

When using
agent-tui
in this macOS environment, the default background daemonization process crashes, causing
Connection refused (os error 61)
errors.
You MUST start the daemon manually shielded from TTY hangups before running any
agent-tui
commands.
Using
nohup
is insufficient; you must use
tmux
to provide a fully isolated pseudo-terminal.
To support parallel runs, only restart the daemon if it is not currently running:
bash
undefined
在macOS环境中使用
agent-tui
时,默认的后台守护进程化进程会崩溃,导致
Connection refused (os error 61)
错误。
**在运行任何
agent-tui
命令之前,您必须手动启动守护进程并避免TTY挂起问题。**仅使用
nohup
是不够的;您必须使用
tmux
提供完全隔离的伪终端。
为了支持并行运行,仅当守护进程未运行时才重启它
bash
undefined

Check if daemon is alive, start it in tmux if it is not

检查守护进程是否存活,若未存活则在tmux中启动

if ! agent-tui sessions >/dev/null 2>&1; then tmux kill-session -t agent-tui 2>/dev/null || true agent-tui daemon stop 2>/dev/null || true rm -f /tmp/agent-tui* tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1' sleep 1 fi
undefined
if ! agent-tui sessions >/dev/null 2>&1; then tmux kill-session -t agent-tui 2>/dev/null || true agent-tui daemon stop 2>/dev/null || true rm -f /tmp/agent-tui* tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1' sleep 1 fi
undefined

Session ID vs PID (Crucial for Reconnection)

会话ID与PID(重连关键)

When
agent-tui run
returns JSON, it includes both a
session_id
and a
pid
. The
pid
is purely informational (the OS process ID of the child command). You do not use the
pid
to reconnect or issue commands. You must always use the
session_id
(e.g.,
--session <id>
).
If the daemon crashes (
os error 61
), the pseudo-terminal is destroyed. Even if the child
pid
survives as an orphan, you cannot reconnect to it. You must restart the daemon using the workaround above and start a completely new session.
agent-tui run
返回JSON时,会包含
session_id
pid
pid
仅用于信息展示(子命令的操作系统进程ID)。您不能使用
pid
进行重连或发送命令。必须始终使用
session_id
(例如:
--session <id>
)。
如果守护进程崩溃(出现
os error 61
),伪终端会被销毁。即使子进程
pid
作为孤儿进程存活,您也无法重连到它。您必须使用上述解决方案重启守护进程,并启动一个全新的会话。

Testing the Gemini CLI

测试Gemini CLI

When testing the Gemini CLI with
agent-tui
, there are several strict requirements to ensure deterministic and accurate behavior:
  1. Build Before Running:
    agent-tui
    runs the built JS files, not TypeScript. You MUST run
    npm run build
    or
    npm run build:all
    after making code changes and before launching the CLI with
    agent-tui
    .
  2. Bypass Trust Modals: Always pass
    GEMINI_CLI_TRUST_WORKSPACE=true
    in the environment. If you don't, any new project-level agents or extensions will trigger a full-screen "Acknowledge and Enable" modal. This modal steals focus, swallows automation keystrokes, and causes
    agent-tui wait
    commands to time out.
  3. Isolated Environments: If you need to test without real user credentials or existing agents interfering, isolate the global settings using
    GEMINI_CLI_HOME=<some-test-dir>
    .
  4. Testing State Deltas (e.g., Reloads): If you are testing features that report deltas (e.g.,
    /agents reload
    outputting "1 new local subagent"), you MUST:
    • Start the CLI first so it establishes its baseline registry.
    • Use a separate shell command (outside of
      agent-tui
      ) to write the new agent
      .md
      /
      .toml
      file.
    • Use
      agent-tui type
      and
      press
      to trigger the
      /agents reload
      command inside the running session.
    • (If you add the files before starting the CLI, they become part of the baseline and won't trigger the delta logic).
bash
undefined
使用
agent-tui
测试Gemini CLI时,有几个严格要求以确保行为的确定性和准确性:
  1. 运行前构建
    agent-tui
    运行的是已构建的JS文件,而非TypeScript文件。在修改代码后、使用
    agent-tui
    启动CLI之前,您必须运行
    npm run build
    npm run build:all
  2. 绕过信任弹窗:始终在环境变量中传入
    GEMINI_CLI_TRUST_WORKSPACE=true
    。如果不设置,任何新的项目级代理或扩展都会触发全屏的“Acknowledge and Enable”弹窗。该弹窗会抢占焦点、拦截自动化按键,并导致
    agent-tui wait
    命令超时。
  3. 隔离环境:如果需要在无真实用户凭据或现有代理干扰的情况下进行测试,请使用
    GEMINI_CLI_HOME=<some-test-dir>
    隔离全局设置。
  4. 测试状态增量(如重载):如果您测试的功能会报告增量(例如
    /agents reload
    输出“1 new local subagent”),您必须
    • 先启动CLI,使其建立基线注册表。
    • 使用单独的shell命令(在
      agent-tui
      外部)写入新的代理
      .md
      /
      .toml
      文件。
    • 使用
      agent-tui type
      press
      在运行的会话中触发
      /agents reload
      命令。
    • (如果在启动CLI之前添加文件,它们会成为基线的一部分,不会触发增量逻辑)。
bash
undefined

Example: Standard isolated run (sandboxed config + bypass trust modals)

示例:标准隔离运行(沙箱配置 + 绕过信任弹窗)

env GEMINI_CLI_TRUST_WORKSPACE=true GEMINI_CLI_HOME=test-gemini-home agent-tui run -d "$(pwd)" node packages/cli/dist/index.js
undefined
env GEMINI_CLI_TRUST_WORKSPACE=true GEMINI_CLI_HOME=test-gemini-home agent-tui run -d "$(pwd)" node packages/cli/dist/index.js
undefined

Terminal Automation Mastery

终端自动化精通指南

Prerequisites

前提条件

  • Supported OS: macOS or Linux (Windows not supported yet).
  • Verify install:
bash
agent-tui --version
If not installed, use one of:
bash
undefined
  • 支持的操作系统:macOS或Linux(暂不支持Windows)。
  • 验证安装
bash
agent-tui --version
如果未安装,可选择以下方式之一:
bash
undefined

Recommended: one-line install (macOS/Linux)

推荐:一键安装(macOS/Linux)

Package manager

包管理器安装

npm i -g agent-tui pnpm add -g agent-tui bun add -g agent-tui

```bash
npm i -g agent-tui pnpm add -g agent-tui bun add -g agent-tui

```bash

Build from source

从源码构建

cargo install --git https://github.com/pproenca/agent-tui.git --path cli/crates/agent-tui

If you used the install script, ensure `~/.local/bin` is on your PATH.
cargo install --git https://github.com/pproenca/agent-tui.git --path cli/crates/agent-tui

如果使用安装脚本,请确保`~/.local/bin`在您的PATH中。

Philosophy: Why Terminal Automation Is Different

理念:终端自动化为何与众不同

Terminal UIs are stateless from the observer's perspective. Unlike web browsers with a persistent DOM, terminal automation works with a constantly-refreshed character grid. This fundamental difference shapes everything:
Web AutomationTerminal Automation
DOM persists across interactionsScreen buffer is redrawn constantly
Selectors are stableText positions may shift
Query once, act many timesMust re-verify before EVERY action
Network events signal completionMust detect visual stability
The Core Insight: agent-tui gives you vision without memory. Each screenshot is a fresh observation. Previous state means nothing after the UI changes. This isn't a limitation—it's the nature of terminal interaction.
终端UI从观察者的角度来看是无状态的。与具有持久DOM的Web浏览器不同,终端自动化处理的是不断刷新的字符网格。这一根本差异决定了所有操作逻辑:
Web自动化终端自动化
DOM在交互过程中持久存在屏幕缓冲区不断重绘
选择器稳定文本位置可能偏移
查询一次,多次操作每次操作前必须重新验证
网络事件标志完成必须检测视觉稳定性
核心洞察:agent-tui为您提供“视觉”但无“记忆”。每个截图都是全新的观察结果。UI变化后,之前的状态毫无意义。这并非局限——而是终端交互的本质。

Mental Model: The Feedback Loop

思维模型:反馈循环

Think of terminal automation as a closed-loop control system:
    ┌──────────────────────────────────────────────┐
    │                                              │
    ▼                                              │
OBSERVE ──► DECIDE ──► ACT ──► WAIT ──► VERIFY ───┘
   │                                        │
   │                                        │
   └─────── NEVER skip ◄────────────────────┘
Each phase is mandatory. Skipping verification is the #1 cause of flaky automation.
将终端自动化视为一个闭环控制系统
    ┌──────────────────────────────────────────────┐
    │                                              │
    ▼                                              │
观察 ──► 决策 ──► 执行 ──► 等待 ──► 验证 ───┘
   │                                        │
   │                                        │
   └─────── 绝不能跳过 ◄────────────────────┘
**每个阶段都是必需的。**跳过验证是自动化不稳定的首要原因。

The "Fresh Eyes" Principle

“全新视角”原则

Every time you need to interact with the UI:
  1. Take a fresh screenshot — your previous one is now stale
  2. Locate your target visually — text positions may have changed
  3. Verify the state — the UI may have changed unexpectedly
  4. Act only when stable — animations and loading states cause failures
This feels slower, but it's the only reliable approach. Optimistic reuse of stale state causes intermittent failures that are painful to debug.
每次需要与UI交互时:
  1. 截取全新截图——之前的截图已过期
  2. 视觉定位目标——文本位置可能已变化
  3. 验证状态——UI可能已意外变化
  4. 仅在稳定时执行操作——动画和加载状态会导致失败
这看起来更慢,但却是唯一可靠的方法。盲目复用过期状态会导致难以调试的间歇性失败。

Critical Rules (Non-Negotiable)

关键规则(不可协商)

RULE 1: Atomic Execution (No Pipelining) You are FORBIDDEN from chaining commands with
&&
(e.g.,
type "x" && press Enter && wait
). Modals or UI updates can intercept your keystrokes. You MUST execute one atomic action, wait, screenshot, and verify before taking the next action in a new turn.
RULE 2: Re-snapshot after EVERY action The UI state is invalidated by any change. Always take a fresh screenshot before acting again.
RULE 3: Never act on unstable UI If the UI is animating, loading, or transitioning,
wait --stable
first. Acting during transitions because race conditions.
RULE 4: Verify before claiming success Use
wait "expected text" --assert
to confirm outcomes. Don't assume an action worked—prove it.
RULE 5: Error Recovery If a
wait
command times out, DO NOT blindly restart or kill the session. Execute
screenshot
to visually diagnose what unexpected UI element (modal, error dialog, lost focus) intercepted the flow.
RULE 6: Clean up sessions Always end with
agent-tui kill
. Orphaned sessions consume resources and can interfere with future runs.
规则1:原子执行(禁止流水线) 禁止使用
&&
链式执行命令(例如:
type "x" && press Enter && wait
)。弹窗或UI更新可能会拦截您的按键。您必须执行一个原子操作,等待、截图、验证,然后在新的步骤中执行下一个操作。
规则2:每次操作后重新截图 任何变化都会使UI状态失效。再次执行操作前务必截取全新截图。
规则3:绝不在不稳定UI上执行操作 如果UI正在动画、加载或过渡,请先执行
wait --stable
。在过渡期间执行操作会导致竞态条件。
规则4:验证后再宣告成功 使用
wait "expected text" --assert
确认结果。不要假设操作成功——要证明它成功。
规则5:错误恢复 如果
wait
命令超时,请勿盲目重启或终止会话。执行
screenshot
以可视化诊断是什么意外UI元素(弹窗、错误对话框、失去焦点)拦截了流程。
规则6:清理会话 始终以
agent-tui kill
结束会话。孤立的会话会消耗资源并可能干扰后续运行。

Decision Framework

决策框架

Which Screenshot Mode?

选择哪种截图模式?

Use
screenshot --format json
when parsing automation output, or plain
screenshot
for human readable text.
解析自动化输出时使用
screenshot --format json
,普通
screenshot
用于人类可读文本。

How to Wait?

如何等待?

What are you waiting for?
├─► Specific text to appear
│   └─► `wait "text" --assert` (fails if not found)
├─► Specific text to disappear
│   └─► `wait "text" --gone --assert`
├─► UI to stop changing (animations, loading)
│   └─► `wait --stable`
└─► Multiple conditions
    └─► Chain waits sequentially
您在等待什么?
├─► 特定文本出现
│   └─► `wait "text" --assert`(未找到则失败)
├─► 特定文本消失
│   └─► `wait "text" --gone --assert`
├─► UI停止变化(动画、加载)
│   └─► `wait --stable`
└─► 多个条件
    └─► 按顺序链式等待

How to Act?

如何执行操作?

What do you need to do?
├─► Type text into the terminal
│   └─► `type "text"`
├─► Send keyboard shortcuts/navigation
│   └─► `press Ctrl+C` or `press ArrowDown Enter`
您需要做什么?
├─► 在终端中输入文本
│   └─► `type "text"`
├─► 发送键盘快捷键/导航
│   └─► `press Ctrl+C` 或 `press ArrowDown Enter`

Core Workflow

核心工作流

The canonical automation loop:
bash
undefined
标准自动化循环:
bash
undefined

1. START: Launch the TUI app

1. 启动:启动TUI应用

agent-tui run <command> [-- args...]
agent-tui run <command> [-- args...]

2. OBSERVE: Get current UI state

2. 观察:获取当前UI状态

agent-tui screenshot --format json
agent-tui screenshot --format json

3. DECIDE: Based on text, determine next action

3. 决策:根据文本确定下一步操作

(This happens in your head/code)

(此步骤在您的头脑/代码中完成)

4. ACT: Execute the action

4. 执行:执行操作

agent-tui type "text" agent-tui press Enter
agent-tui type "text" agent-tui press Enter

5. WAIT: Synchronize with UI changes

5. 等待:与UI变化同步

agent-tui wait "Expected" --assert # or wait --stable
agent-tui wait "Expected" --assert # 或 wait --stable

6. VERIFY: Confirm the outcome (often combined with step 5)

6. 验证:确认结果(通常与步骤5结合)

If verification fails, handle the error

如果验证失败,处理错误

7. REPEAT: Go back to step 2 until done

7. 重复:回到步骤2直到完成

8. CLEANUP: Always clean up

8. 清理:始终进行清理

agent-tui kill
undefined
agent-tui kill
undefined

Anti-Patterns (What NOT to Do)

反模式(切勿这样做)

❌ Acting During Animation/Loading

❌ 在动画/加载期间执行操作

bash
undefined
bash
undefined

WRONG: Acting immediately on dynamic UI

错误:在动态UI上立即执行操作

agent-tui run my-app agent-tui screenshot --format json # UI might still be loading! agent-tui type "value" # ❌ Might miss the input field
agent-tui run my-app agent-tui screenshot --format json # UI可能仍在加载! agent-tui type "value" # ❌ 可能错过输入字段

RIGHT: Wait for stability first

正确:先等待稳定

agent-tui run my-app agent-tui wait --stable # Let UI settle agent-tui screenshot --format json # Now it's reliable agent-tui type "value"
undefined
agent-tui run my-app agent-tui wait --stable # 等待UI稳定 agent-tui screenshot --format json # 现在可靠了 agent-tui type "value"
undefined

❌ Assuming Success Without Verification

❌ 未验证就假设成功

bash
undefined
bash
undefined

WRONG: Assuming the type worked

错误:假设输入成功

agent-tui type "value" agent-tui press Enter
agent-tui type "value" agent-tui press Enter

...proceed as if success... # ❌ What if it failed silently?

...继续操作,仿佛已成功... # ❌ 如果静默失败怎么办?

RIGHT: Verify the outcome

正确:验证结果

agent-tui type "value" agent-tui press Enter agent-tui wait "Success" --assert # ✓ Proves the action worked
undefined
agent-tui type "value" agent-tui press Enter agent-tui wait "Success" --assert # ✓ 证明操作成功
undefined

❌ Skipping Cleanup

❌ 跳过清理

bash
undefined
bash
undefined

WRONG: Forgetting to kill the session

错误:忘记终止会话

agent-tui run my-app
agent-tui run my-app

...do stuff...

...执行操作...

script ends # ❌ Session left running!

脚本结束 # ❌ 会话仍在运行!

RIGHT: Always clean up

正确:始终清理

agent-tui run my-app
agent-tui run my-app

...do stuff...

...执行操作...

agent-tui kill # ✓ Clean exit
undefined
agent-tui kill # ✓ 干净退出
undefined

Before You Start: Clarify Requirements

开始前:明确需求

Before automating any TUI, gather this information:
  1. Command: What exactly to run? (
    my-app --flag
    or
    npm start
    ?)
  2. Success criteria: What text/state indicates success?
  3. Input sequence: What keystrokes/data to enter, in what order?
  4. Safety: Is it safe to submit forms, delete data, etc.?
  5. Auth: Does it need login? Test credentials?
  6. Live preview: Does the user want to watch? (
    agent-tui live start --open
    )
If any of these are unclear, ask before running.
在自动化任何TUI之前,请收集以下信息:
  1. 命令:具体要运行什么?(
    my-app --flag
    还是
    npm start
    ?)
  2. 成功标准:什么文本/状态表示成功?
  3. 输入序列:要输入哪些按键/数据,顺序如何?
  4. 安全性:提交表单、删除数据等操作是否安全?
  5. 认证:是否需要登录?测试凭据?
  6. 实时预览:用户是否希望观看?(
    agent-tui live start --open
如果有任何不清楚的地方,请在运行前询问。

Demo Mode: Showing What agent-tui Can Do

演示模式:展示agent-tui的功能

When a user asks what agent-tui is, wants a demo, or asks "show me how it works":
  1. Don't explain—demonstrate. Actions speak louder than words.
  2. Use the live preview so they can watch in real-time.
  3. Run
    top
    —it's universal and shows dynamic real-time updates.
Quick demo trigger phrases:
  • "What is agent-tui?" / "What does agent-tui do?"
  • "Demo agent-tui" / "Show me agent-tui"
  • "How does agent-tui work?" / "See it in action"
当用户询问agent-tui是什么、想要演示,或询问“show me how it works”时:
  1. **不要解释——直接演示。**行动胜于言语。
  2. 使用实时预览让用户可以实时观看。
  3. 运行
    top
    ——它是通用工具,可展示动态实时更新。
快速演示触发短语:
  • "What is agent-tui?" / "What does agent-tui do?"
  • "Demo agent-tui" / "Show me agent-tui"
  • "How does agent-tui work?" / "See it in action"

Failure Recovery

故障恢复

SymptomDiagnosisSolution
"Text not found"Stale view or text movedRe-snapshot, locate text again
Wait times outUI didn't reach expected stateCheck screenshot, verify expectations
"Daemon not running"Daemon crashed or not started
agent-tui daemon start
Unexpected layoutWrong terminal size
agent-tui resize --cols 120 --rows 40
Session unresponsiveApp crashed or hung
agent-tui kill
, then re-run
Repeated failuresSomething fundamentally wrongStop after 3-5 attempts, ask user
症状诊断解决方案
"Text not found"视图过期或文本位置移动重新截图,重新定位文本
等待超时UI未达到预期状态检查截图,验证预期结果
"Daemon not running"守护进程崩溃或未启动
agent-tui daemon start
布局异常终端尺寸错误
agent-tui resize --cols 120 --rows 40
会话无响应应用崩溃或挂起
agent-tui kill
,然后重新运行
重复失败存在根本性问题尝试3-5次后停止,询问用户

Self-Discovery: Use --help

自我探索:使用--help

You don't need to memorize every flag. The CLI is self-documenting:
bash
agent-tui --help                     # List all commands
agent-tui run --help                 # Options for 'run'
agent-tui screenshot --help          # Options for 'screenshot'
agent-tui wait --help                # Options for 'wait'
When in doubt, ask the CLI. This skill teaches when and why to use commands. For exact flags and syntax,
--help
is authoritative.
您无需记住所有标志。CLI自带文档:
bash
agent-tui --help                     # 列出所有命令
agent-tui run --help                 # 'run'命令的选项
agent-tui screenshot --help          # 'screenshot'命令的选项
agent-tui wait --help                # 'wait'命令的选项
**如有疑问,请询问CLI。**本技能教授何时以及为何使用命令。对于确切的标志和语法,
--help
是权威来源。

Quick Reference

快速参考

bash
undefined
bash
undefined

Start app

启动应用

agent-tui run <cmd> [-- args] # Launch TUI under control
agent-tui run <cmd> [-- args] # 在控制下启动TUI

Observe

观察

agent-tui screenshot # Plain text view agent-tui screenshot --format json # Machine-readable output
agent-tui screenshot # 纯文本视图 agent-tui screenshot --format json # 机器可读输出

Act

执行操作

agent-tui press Enter # Press key(s) agent-tui press Ctrl+C # Keyboard shortcuts agent-tui type "text" # Type text
agent-tui press Enter # 按下按键 agent-tui press Ctrl+C # 键盘快捷键 agent-tui type "text" # 输入文本

Wait/Verify

等待/验证

agent-tui wait "text" --assert # Wait for text, fail if not found agent-tui wait "text" --gone --assert # Wait for text to disappear agent-tui wait --stable # Wait for UI to stop changing
agent-tui wait "text" --assert # 等待文本出现,未找到则失败 agent-tui wait "text" --gone --assert # 等待文本消失,未消失则失败 agent-tui wait --stable # 等待UI停止变化

Manage

管理

agent-tui sessions # List active sessions agent-tui live start --open # Start live preview agent-tui kill # End current session
undefined
agent-tui sessions # 列出活跃会话 agent-tui live start --open # 启动实时预览 agent-tui kill # 结束当前会话
undefined