droid-control
Original:🇺🇸 English
Translated
Control terminal TUIs and web/Electron apps for testing, demos, QA, and computer-use tasks. Use when you need to automate a CLI, drive a browser, record a demo, or capture proof artifacts.
6installs
Added on
NPX Install
npx skill4agent add factory-ai/factory-plugins droid-controlTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Droid Control
Automate terminals and browsers. Three routing decisions, then atoms guide you the rest of the way.
Ground rules
- Real apps, real environments. Non-deterministic behavior (LLM responses, network latency, variable output) is expected. Handle it with /
wait. Never substitute fixtures or mocked data.wait-idle - Commit to execution. Once you've chosen a driver, run the plan. If something fails mid-run, recover and retry -- don't re-evaluate the approach.
- Atoms are self-contained. Load one and follow its mechanics. No cross-referencing needed.
- is the ONLY way to launch recorded sessions.
tctlmanages recording by wrappingtctlaround the PTY — rawasciinema rechas no recording capability and never will. Never calltuistorydirectly; unknown flags crashtuistory launch. Always resolvetuistory-relayto its absolute filesystem path before use, especially when delegating to workers (they don't inheritTCTL).${DROID_PLUGIN_ROOT} - Isolate every run. Multiple droids may be filming simultaneously on the same machine. Session names and output paths share a global namespace (). At the start of every workflow, generate a run ID (
/tmp/tctl-sessions/or similar) and use it as a prefix for all session names and a scoped temp directory for all output files:RUN_ID=$(date +%s)-$$Never use bare session names likebashRUN_ID="$(date +%s)-$$" RUN_DIR="$(mktemp -d /tmp/droid-run-${RUN_ID}-XXXXXX)" # Session names: -s ${RUN_ID}-before, -s ${RUN_ID}-after # Output paths: ${RUN_DIR}/before.cast, ${RUN_DIR}/after.cast,-s demo,-s before— they will collide with concurrent runs.-s after
Routing
Three independent lookups. Do all three, then load the union of skills they produce.
1. Target route — what are you driving?
| Target | Load these skills |
|---|---|
Droid CLI ( | droid-cli + tuistory backend via |
| Droid CLI (real terminal proof) | true-input + droid-cli |
| Other terminal TUI | tuistory backend via |
| Other terminal TUI (real terminal proof) | true-input |
| Web page or Electron app | agent-browser |
| Raw terminal byte sequences | true-input + pty-capture |
tuistory is the default for terminal work. Use true-input only when you need real terminal rendering evidence.
2. Stage route — what does the workflow need?
Every workflow passes through stages. Load the atoms for each stage you'll use.
| Stage | Skill | When to load |
|---|---|---|
| Capture | capture | Always -- every workflow records or captures something |
| Compose | compose | When the deliverable is a produced artifact (video, annotated screenshots, comparison image) |
| Verify | verify | Always -- every deliverable gets checked against commitments |
3. Artifact route — does compose need polish tools?
Only relevant when compose is loaded.
| Artifact need | Also load |
|---|---|
| Showcase polish (window chrome, branded frame, cinematic background) | showcase |
| Effects and keystroke overlays | (compose handles this — they're fields in the Remotion props JSON) |
Workflow shape
Command (intent + commitments)
→ Target route (load driver atoms)
→ Capture (record / screenshot / byte-capture)
→ Compose (assemble deliverable, if needed)
→ Verify (check against commitments)
→ ReportCommands declare what to produce. Atoms own how.
Layout default
Default: . One clip showing the target/final state. Pick this unless the deliverable is fundamentally a comparison.
single| Case | Layout |
|---|---|
| Brand-new feature (no meaningful prior state) | |
| Bug fix, single-clip proof of the working path | |
| Walkthrough / tutorial / readme hero | |
| Regression proof (broken vs fixed) | |
| Behavior-preserving refactor (visual parity is the point) | |
| User explicitly asks for a comparison | |
Do not synthesize a "before" state to justify . If there is no real baseline, use .
side-by-sidesingleDelegation
The parent agent plans and orchestrates. Mechanical work runs in worker subagents via the Task tool. This keeps the parent's context clean and enables parallelism.
What to delegate
| Task | Delegate? | Why |
|---|---|---|
| Capture clip (single layout) | YES | Worker runs the interaction script end-to-end and returns the |
| Capture both clips (comparison layout) | YES — | Branches are independent; run in parallel |
| Remotion render | YES | Needs only props JSON, clip paths, output path. Runs |
| Planning, interaction scripting | NO — parent | Requires PR context and editorial judgment |
| Layout and prop construction | NO — parent | Requires editorial decisions about effects, timing, labels |
| Verification | NO — parent | Requires commitment context |
| Single ffprobe / file-existence check | NO — inline | Too trivial for subagent overhead |
How to delegate
Step 0: Resolve paths and generate a run ID. Workers don't inherit . Resolve once, paste everywhere:
${DROID_PLUGIN_ROOT}bash
TCTL="$(realpath "${DROID_PLUGIN_ROOT}/bin/tctl")"
RENDER="$(realpath "${DROID_PLUGIN_ROOT}/scripts/render-showcase.sh")"
RUN_ID="$(date +%s)-$$"
RUN_DIR="$(mktemp -d /tmp/droid-run-${RUN_ID}-XXXXXX)"Use for all output files (recordings, props, rendered video). Use as a prefix for all session names. Never use bare names like or hardcoded paths like .
${RUN_DIR}${RUN_ID}--s before/tmp/before.castGive workers exact commands with the resolved absolute paths — not abstract instructions, not , not . The parent does the thinking; the worker executes:
tuistory${DROID_PLUGIN_ROOT}Task prompt for a capture worker:
"Run these commands in order. Report the output file path and any errors.
1. /abs/path/to/bin/tctl launch "droid-dev" -s 1712345678-42-before --backend tuistory \
--repo-root /abs/path/to/baseline/worktree \
--cols 120 --rows 36 --record /tmp/droid-run-1712345678-42-xxxx/before.cast \
--env FORCE_COLOR=3 --env COLORTERM=truecolor
2. /abs/path/to/bin/tctl -s 1712345678-42-before wait ">" --timeout 15000
3. /abs/path/to/bin/tctl -s 1712345678-42-before type "hello world"
4. /abs/path/to/bin/tctl -s 1712345678-42-before press enter
5. /abs/path/to/bin/tctl -s 1712345678-42-before wait-idle
6. /abs/path/to/bin/tctl -s 1712345678-42-before close"Task prompt for a Remotion render worker:
"Run this command. Report the output file path and any errors.
/abs/path/to/scripts/render-showcase.sh \
--props /tmp/droid-run-1712345678-42-xxxx/showcase-props.json \
--output /tmp/droid-run-1712345678-42-xxxx/demo.mp4 \
/tmp/droid-run-1712345678-42-xxxx/before.cast /tmp/droid-run-1712345678-42-xxxx/after.cast"Parallel capture pattern (comparison flows only)
Only applicable when the Layout default table above selects . For a layout, launch one capture worker and skip this section.
side-by-sidesingleFor before/after comparison demos, launch both capture workers simultaneously:
1. Parent constructs the interaction script (identical for both branches)
2. Launch worker A: capture the baseline/reference branch with `--repo-root` set to that worktree
3. Launch worker B: capture the candidate/change branch with `--repo-root` set to that worktree
4. Wait for both to complete (TaskOutput)
5. Collect .cast paths from results
6. Continue to composeShared tooling
Terminal drivers use the unified wrapper. agent-browser has its own CLI and does not use .
tctltctlDrivers can be combined in one workflow — e.g., for a CLI and for a web UI it interacts with.
tctlagent-browserPrerequisites
| Stage | Platform | Required | Optional |
|---|---|---|---|
| tuistory | All | | |
| true-input | Linux/Wayland | | |
| true-input | Windows (KVM) | | |
| true-input | macOS (QEMU) | | — |
| agent-browser | All | | — |
| compose | All | | — |
| showcase | All | Node.js (>= 18), Chrome/Chromium | — |
Install commands
bash
# tuistory driver + recording
npm install -g tuistory # virtual PTY driver
pip install asciinema # terminal recording (tctl wraps this)
cargo install --git https://github.com/asciinema/agg # .cast -> .gif converter (compose needs this)
# true-input driver (Linux/Wayland)
sudo apt-get install -y cage wtype # required: headless compositor + keystroke injection
sudo apt-get install -y grim wf-recorder # optional: screenshots + video recording
# agent-browser driver
agent-browser install # one-time: downloads bundled Chromium
# compose + showcase (video rendering)
sudo apt-get install -y ffmpeg # video processing (includes ffprobe)
cd ${DROID_PLUGIN_ROOT}/remotion && npm install # Remotion dependencies
# Chrome or Chromium must be installed for Remotion rendering