Terminal Automation with pilotty
CRITICAL: Argument Positioning
All flags (, , , etc.) MUST come BEFORE positional arguments:
bash
# CORRECT - flags before command/arguments
pilotty spawn --name myapp vim file.txt
pilotty key -s myapp Enter
pilotty snapshot -s myapp --format text
# WRONG - flags after command (they get passed to the app, not pilotty!)
pilotty spawn vim file.txt --name myapp # FAILS: --name goes to vim
pilotty key Enter -s myapp # FAILS: -s goes nowhere useful
This is the #1 cause of agent failures. When in doubt: flags first, then command/args.
Quick start
bash
pilotty spawn vim file.txt # Start TUI app in managed session
pilotty wait-for "file.txt" # Wait for app to be ready
pilotty snapshot # Get screen state with UI elements
pilotty key i # Enter insert mode
pilotty type "Hello, World!" # Type text
pilotty key Escape # Exit insert mode
pilotty kill # End session
Core workflow
- Spawn: starts the app in a background PTY
- Wait: ensures the app is ready
- Snapshot: returns screen state with detected UI elements
- Understand: Parse to identify buttons, inputs, toggles
- Interact: Use keyboard commands (, ) to navigate and interact
- Re-snapshot: Check to detect screen changes
Commands
Session management
bash
pilotty spawn <command> # Start TUI app (e.g., pilotty spawn htop)
pilotty spawn --name myapp <cmd> # Start with custom session name (--name before command)
pilotty kill # Kill default session
pilotty kill -s myapp # Kill specific session
pilotty list-sessions # List all active sessions
pilotty daemon # Manually start daemon (usually auto-starts)
pilotty shutdown # Stop daemon and all sessions
pilotty examples # Show end-to-end workflow example
Screen capture
bash
pilotty snapshot # Full JSON with text content and elements
pilotty snapshot --format compact # JSON without text field
pilotty snapshot --format text # Plain text with cursor indicator
pilotty snapshot -s myapp # Snapshot specific session
# Wait for screen to change (eliminates need for sleep!)
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH # Block until screen changes
pilotty snapshot --await-change $HASH --settle 50 # Wait for 50ms stability
Input
bash
pilotty type "hello" # Type text at cursor
pilotty type -s myapp "text" # Type in specific session
pilotty key Enter # Press Enter
pilotty key Ctrl+C # Send interrupt
pilotty key Escape # Send Escape
pilotty key Tab # Send Tab
pilotty key F1 # Function key
pilotty key Alt+F # Alt combination
pilotty key Up # Arrow key
pilotty key -s myapp Ctrl+S # Key in specific session
# Key sequences (space-separated, sent in order)
pilotty key "Ctrl+X m" # Emacs chord: Ctrl+X then m
pilotty key "Escape : w q Enter" # vim :wq sequence
pilotty key "a b c" --delay 50 # Send a, b, c with 50ms delay
pilotty key -s myapp "Tab Tab Enter" # Sequence in specific session
Interaction
bash
pilotty click 5 10 # Click at row 5, col 10
pilotty click -s myapp 10 20 # Click in specific session
pilotty scroll up # Scroll up 1 line
pilotty scroll down 5 # Scroll down 5 lines
pilotty scroll up 10 -s myapp # Scroll in specific session
Terminal control
bash
pilotty resize 120 40 # Resize terminal to 120 cols x 40 rows
pilotty resize 80 24 -s myapp # Resize specific session
pilotty wait-for "Ready" # Wait for text to appear (30s default)
pilotty wait-for "Error" -r # Wait for regex pattern
pilotty wait-for "Done" -t 5000 # Wait with 5s timeout
pilotty wait-for "~" -s editor # Wait in specific session
Global options
| Option | Description |
|---|
| Target specific session (default: "default") |
| Snapshot format: full, compact, text |
| Timeout for wait-for and await-change (default: 30000) |
| Treat wait-for pattern as regex |
| Session name for spawn command |
| Delay between keys in a sequence (default: 0, max: 10000) |
| Block snapshot until content_hash differs |
| Wait for screen to be stable for this many ms (default: 0) |
Environment variables
bash
PILOTTY_SESSION="mysession" # Default session name
PILOTTY_SOCKET_DIR="/tmp/pilotty" # Override socket directory
RUST_LOG="debug" # Enable debug logging
Snapshot Output
The
command returns structured JSON with detected UI elements:
json
{
"snapshot_id": 42,
"size": { "cols": 80, "rows": 24 },
"cursor": { "row": 5, "col": 10, "visible": true },
"text": "Settings:\n [x] Notifications [ ] Dark mode\n [Save] [Cancel]",
"elements": [
{ "kind": "toggle", "row": 1, "col": 2, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
{ "kind": "toggle", "row": 1, "col": 20, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
{ "kind": "button", "row": 2, "col": 2, "width": 6, "text": "[Save]", "confidence": 0.8 },
{ "kind": "button", "row": 2, "col": 10, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
],
"content_hash": 12345678901234567890
}
Use
for a plain text view with cursor indicator:
--- Terminal 80x24 | Cursor: (5, 10) ---
bash-3.2$ [_]
The
shows cursor position. Use the text content to understand screen state and navigate with keyboard commands.
Element Detection
pilotty automatically detects interactive UI elements in terminal applications. Elements provide read-only context to help understand UI structure.
Element Kinds
| Kind | Detection Patterns | Confidence | Fields |
|---|
| toggle | , , , , | 1.0 | |
| button | Inverse video, , , | 1.0 / 0.8 | (if true) |
| input | Cursor position, underscores | 1.0 / 0.6 | (if true) |
Element Fields
| Field | Type | Description |
|---|
| string | Element type: , , or |
| number | Row position (0-based from top) |
| number | Column position (0-based from left) |
| number | Width in terminal cells (CJK chars = 2) |
| string | Text content of the element |
| number | Detection confidence (0.0-1.0) |
| bool | Whether element has focus (only present if true) |
| bool | Toggle state (only present for toggles) |
Confidence Levels
| Confidence | Meaning |
|---|
| 1.0 | High confidence: Cursor position, inverse video, checkbox patterns |
| 0.8 | Medium confidence: Bracket patterns , |
| 0.6 | Lower confidence: Underscore input fields |
Wait for Screen Changes (Recommended)
Stop guessing sleep durations! Use
to wait for the screen to actually update:
bash
# Capture baseline hash
HASH=$(pilotty snapshot | jq '.content_hash')
# Perform action
pilotty key Enter
# Wait for screen to change (blocks until hash differs)
pilotty snapshot --await-change $HASH
# Or wait for screen to stabilize (for apps that render progressively)
pilotty snapshot --await-change $HASH --settle 100
Flags:
| Flag | Description |
|---|
| Block until differs from this value |
| After change detected, wait for screen to be stable for MS |
| Maximum wait time (default: 30000) |
Why this is better than sleep:
- is a guess - too short causes race conditions, too long slows automation
- waits exactly as long as needed - no more, no less
- handles apps that render progressively (show partial, then complete)
Waiting for Streaming AI Responses
When interacting with AI-powered TUIs (like opencode, etc.) that stream responses, you need a longer
time since the screen keeps updating as tokens arrive:
bash
# 1. Capture hash before sending prompt
HASH=$(pilotty snapshot -s myapp | jq -r '.content_hash')
# 2. Type prompt and submit
pilotty type -s myapp "write me a poem about ai agents"
pilotty key -s myapp Enter
# 3. Wait for streaming response to complete
# - Use longer settle (2-3s) since AI apps pause between chunks
# - Extend timeout for long responses (60s+)
pilotty snapshot -s myapp --await-change "$HASH" --settle 3000 -t 60000
# 4. Response may be scrolled - scroll up if needed to see full output
pilotty scroll -s myapp up 10
pilotty snapshot -s myapp --format text
Key parameters for streaming:
- : AI responses have pauses between chunks; 2-3 seconds ensures streaming is truly done
- : Extend timeout beyond the 30s default for longer generations
- The settle timer resets on each screen change, so it naturally waits until streaming stops
Manual Change Detection
For manual polling (not recommended), use
directly:
bash
# Get initial state
SNAP1=$(pilotty snapshot)
HASH1=$(echo "$SNAP1" | jq -r '.content_hash')
# Perform action
pilotty key Tab
# Check if screen changed
SNAP2=$(pilotty snapshot)
HASH2=$(echo "$SNAP2" | jq -r '.content_hash')
if [ "$HASH1" != "$HASH2" ]; then
echo "Screen changed - re-analyze elements"
fi
Using Elements Effectively
Elements are read-only context for understanding the UI. Use keyboard navigation for reliable interaction:
bash
# 1. Get snapshot to understand UI structure
pilotty snapshot | jq '.elements'
# Output shows toggles (checked/unchecked) and buttons with positions
# 2. Navigate and interact with keyboard (reliable approach)
pilotty key Tab # Move to next element
pilotty key Space # Toggle checkbox
pilotty key Enter # Activate button
# 3. Verify state changed
pilotty snapshot | jq '.elements[] | select(.kind == "toggle")'
Key insight: Use elements to understand WHAT is on screen, use keyboard to interact with it.
Navigation Approach
pilotty uses keyboard-first navigation, just like a human would:
bash
# 1. Take snapshot to see the screen
pilotty snapshot --format text
# 2. Navigate using keyboard
pilotty key Tab # Move to next element
pilotty key Enter # Activate/select
pilotty key Escape # Cancel/back
pilotty key Up # Move up in list/menu
pilotty key Space # Toggle checkbox
# 3. Type text when needed
pilotty type "search term"
pilotty key Enter
# 4. Click at coordinates for mouse-enabled TUIs
pilotty click 5 10 # Click at row 5, col 10
Key insight: Parse the snapshot text and elements to understand what's on screen, then use keyboard commands to navigate. This works reliably across all TUI applications.
Example: Edit file with vim
bash
# 1. Spawn vim
pilotty spawn --name editor vim /tmp/hello.txt
# 2. Wait for vim to load and capture baseline hash
pilotty wait-for -s editor "hello.txt"
HASH=$(pilotty snapshot -s editor | jq '.content_hash')
# 3. Enter insert mode
pilotty key -s editor i
# 4. Type content
pilotty type -s editor "Hello from pilotty!"
# 5. Wait for screen to update, then exit (no sleep needed!)
pilotty snapshot -s editor --await-change $HASH --settle 50
pilotty key -s editor "Escape : w q Enter"
# 6. Verify session ended
pilotty list-sessions
Alternative using individual keys:
bash
pilotty key -s editor Escape
pilotty type -s editor ":wq"
pilotty key -s editor Enter
Example: Dialog checklist interaction
bash
# 1. Spawn dialog checklist (--name before command)
pilotty spawn --name opts dialog --checklist "Select features:" 12 50 4 \
"notifications" "Push notifications" on \
"darkmode" "Dark mode theme" off \
"autosave" "Auto-save documents" on \
"telemetry" "Usage analytics" off
# 2. Wait for dialog to render (use await-change, not sleep!)
pilotty snapshot -s opts --settle 200 # Wait for initial render to stabilize
# 3. Get snapshot and examine elements, capture hash
SNAP=$(pilotty snapshot -s opts)
echo "$SNAP" | jq '.elements[] | select(.kind == "toggle")'
HASH=$(echo "$SNAP" | jq '.content_hash')
# 4. Navigate to "darkmode" and toggle it
pilotty key -s opts Down # Move to second option
pilotty key -s opts Space # Toggle it on
# 5. Wait for change and verify
pilotty snapshot -s opts --await-change $HASH | jq '.elements[] | select(.kind == "toggle") | {text, checked}'
# 6. Confirm selection
pilotty key -s opts Enter
# 7. Clean up
pilotty kill -s opts
Example: Form filling with elements
bash
# 1. Spawn a form application
pilotty spawn --name form my-form-app
# 2. Get snapshot to understand form structure
pilotty snapshot -s form | jq '.elements'
# Shows inputs, toggles, and buttons with positions for click command
# 3. Tab to first input (likely already focused)
pilotty type -s form "myusername"
# 4. Tab to password field
pilotty key -s form Tab
pilotty type -s form "mypassword"
# 5. Tab to remember me and toggle
pilotty key -s form Tab
pilotty key -s form Space
# 6. Tab to Login and activate
pilotty key -s form Tab
pilotty key -s form Enter
# 7. Check result
pilotty snapshot -s form --format text
Example: Monitor with htop
bash
# 1. Spawn htop
pilotty spawn --name monitor htop
# 2. Wait for display
pilotty wait-for -s monitor "CPU"
# 3. Take snapshot to see current state
pilotty snapshot -s monitor --format text
# 4. Send commands
pilotty key -s monitor F9 # Kill menu
pilotty key -s monitor q # Quit
# 5. Kill session
pilotty kill -s monitor
Example: Interact with AI TUI (opencode, etc.)
AI-powered TUIs stream responses, requiring special handling:
bash
# 1. Spawn the AI app
pilotty spawn --name ai opencode
# 2. Wait for the prompt to be ready
pilotty wait-for -s ai "Ask anything" -t 15000
# 3. Capture baseline hash
HASH=$(pilotty snapshot -s ai | jq -r '.content_hash')
# 4. Type prompt and submit
pilotty type -s ai "explain the architecture of this codebase"
pilotty key -s ai Enter
# 5. Wait for streaming response to complete
# - settle=3000: Wait 3s of no changes to ensure streaming is done
# - timeout=60000: Allow up to 60s for long responses
pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000 --format text
# 6. If response is long and scrolled, scroll up to see full output
pilotty scroll -s ai up 20
pilotty snapshot -s ai --format text
# 7. Clean up
pilotty kill -s ai
Gotchas with AI apps:
- Use because AI responses pause between chunks
- Extend timeout with for complex prompts
- Long responses may scroll the terminal; use to see the beginning
- The settle timer resets on each screen update, so it waits for true completion
Sessions
Each session is isolated with its own:
- PTY (pseudo-terminal)
- Screen buffer
- Child process
bash
# Run multiple apps (--name must come before the command)
pilotty spawn --name monitoring htop
pilotty spawn --name editor vim file.txt
# Target specific session
pilotty snapshot -s monitoring
pilotty key -s editor Ctrl+S
# List all
pilotty list-sessions
# Kill specific
pilotty kill -s editor
The first session spawned without
is automatically named
.
Important: The
flag must come
before the command. Everything after the command is passed as arguments to that command.
Daemon Architecture
pilotty uses a background daemon for session management:
- Auto-start: Daemon starts on first command
- Auto-stop: Shuts down after 5 minutes with no sessions
- Session cleanup: Sessions removed when process exits (within 500ms)
- Shared state: Multiple CLI calls share sessions
You rarely need to manage the daemon manually.
Error Handling
Errors include actionable suggestions:
json
{
"code": "SESSION_NOT_FOUND",
"message": "Session 'abc123' not found",
"suggestion": "Run 'pilotty list-sessions' to see available sessions"
}
json
{
"code": "SPAWN_FAILED",
"message": "Failed to spawn process: command not found",
"suggestion": "Check that the command exists and is in PATH"
}
Common Patterns
Reliable action + wait (recommended)
bash
# The pattern: capture hash, act, await change
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH --settle 50
# This replaces fragile patterns like:
# pilotty key Enter && sleep 1 && pilotty snapshot # BAD: guessing
Wait then act
bash
pilotty spawn my-app
pilotty wait-for "Ready" # Ensure app is ready
pilotty snapshot # Then snapshot
Check state before action
bash
pilotty snapshot --format text | grep "Error" # Check for errors
pilotty key Enter # Then proceed
Check for specific element
bash
# Check if the first toggle is checked
pilotty snapshot | jq '.elements[] | select(.kind == "toggle") | {text, checked}' | head -1
# Find element at specific position
pilotty snapshot | jq '.elements[] | select(.row == 5 and .col == 10)'
Retry on timeout
bash
pilotty wait-for "Ready" -t 5000 || {
pilotty snapshot --format text # Check what's on screen
# Adjust approach based on actual state
}
Deep-dive Documentation
For detailed patterns and edge cases, see:
| Reference | Description |
|---|
| references/session-management.md | Multi-session patterns, isolation, cleanup |
| references/key-input.md | Complete key combinations reference |
| references/element-detection.md | Detection rules, confidence, patterns |
Ready-to-use Templates
Executable workflow scripts:
| Template | Description |
|---|
| templates/vim-workflow.sh | Edit file with vim, save, exit |
| templates/dialog-interaction.sh | Handle dialog/whiptail prompts |
| templates/multi-session.sh | Parallel TUI orchestration |
| templates/element-detection.sh | Element detection demo |
Usage:
bash
./templates/vim-workflow.sh /tmp/myfile.txt "File content here"
./templates/dialog-interaction.sh
./templates/multi-session.sh
./templates/element-detection.sh