task-orchestrator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Task Orchestrator

任务编排器

Autonomous orchestration of multi-agent builds using tmux + Codex with self-healing monitoring.
Load the senior-engineering skill alongside this one for engineering principles.
基于tmux + Codex的自主多Agent构建编排,附带自愈监控功能。
请同时加载senior-engineering技能以遵循工程原则。

Core Concepts

核心概念

1. Task Manifest

1. 任务清单

A JSON file defining all tasks, their dependencies, files touched, and status.
json
{
  "project": "project-name",
  "repo": "owner/repo",
  "workdir": "/path/to/worktrees",
  "created": "2026-01-17T00:00:00Z",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {
          "id": "t1",
          "issue": 1,
          "title": "Fix X",
          "files": ["src/foo.js"],
          "dependsOn": [],
          "status": "pending",
          "worktree": null,
          "tmuxSession": null,
          "startedAt": null,
          "lastProgress": null,
          "completedAt": null,
          "prNumber": null
        }
      ]
    }
  ]
}
一个定义所有任务、任务依赖、涉及文件以及状态的JSON文件。
json
{
  "project": "project-name",
  "repo": "owner/repo",
  "workdir": "/path/to/worktrees",
  "created": "2026-01-17T00:00:00Z",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {
          "id": "t1",
          "issue": 1,
          "title": "Fix X",
          "files": ["src/foo.js"],
          "dependsOn": [],
          "status": "pending",
          "worktree": null,
          "tmuxSession": null,
          "startedAt": null,
          "lastProgress": null,
          "completedAt": null,
          "prNumber": null
        }
      ]
    }
  ]
}

2. Dependency Rules

2. 依赖规则

  • Same file = sequential — Tasks touching the same file must run in order or merge
  • Different files = parallel — Independent tasks can run simultaneously
  • Explicit depends = wait
    dependsOn
    array enforces ordering
  • Phase gates — Next phase waits for current phase completion
  • 同一文件 = 串行执行 — 涉及同一文件的任务必须按顺序执行或合并
  • 不同文件 = 并行执行 — 独立任务可同时运行
  • 显式依赖 = 等待
    dependsOn
    数组强制指定执行顺序
  • 阶段闸门 — 下一阶段需等待当前阶段完成后再启动

3. Execution Model

3. 执行模型

  • Each task gets its own git worktree (isolated branch)
  • Each task runs in its own tmux session
  • Use Codex with --yolo for autonomous execution
  • Model: GPT-5.2-codex high (configurable)

  • 每个任务拥有独立的git worktree(隔离分支)
  • 每个任务在独立的tmux会话中运行
  • 使用Codex --yolo模式实现自主执行
  • 模型:GPT-5.2-codex high(可配置)

Setup Commands

设置命令

Initialize Orchestration

初始化编排

bash
undefined
bash
undefined

1. Create working directory

1. 创建工作目录

WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)" mkdir -p "$WORKDIR"
WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)" mkdir -p "$WORKDIR"

2. Clone repo for worktrees

2. 克隆仓库用于创建worktree

git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo" cd "$WORKDIR/repo"
git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo" cd "$WORKDIR/repo"

3. Create tmux socket

3. 创建tmux套接字

SOCKET="$WORKDIR/orchestrator.sock"
SOCKET="$WORKDIR/orchestrator.sock"

4. Initialize manifest

4. 初始化清单

cat > "$WORKDIR/manifest.json" << 'EOF' { "project": "PROJECT_NAME", "repo": "OWNER/REPO", "workdir": "WORKDIR_PATH", "socket": "SOCKET_PATH", "created": "TIMESTAMP", "model": "gpt-5.2-codex", "modelTier": "high", "phases": [] } EOF
undefined
cat > "$WORKDIR/manifest.json" << 'EOF' { "project": "PROJECT_NAME", "repo": "OWNER/REPO", "workdir": "WORKDIR_PATH", "socket": "SOCKET_PATH", "created": "TIMESTAMP", "model": "gpt-5.2-codex", "modelTier": "high", "phases": [] } EOF
undefined

Analyze GitHub Issues for Dependencies

分析GitHub Issues以识别依赖

bash
undefined
bash
undefined

Fetch all open issues

获取所有未关闭的issues

gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json
gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json

Group by files mentioned in issue body

根据issue正文中提到的文件进行分组

Tasks touching same files should serialize

涉及同一文件的任务应串行执行

undefined
undefined

Create Worktrees

创建Worktree

bash
undefined
bash
undefined

For each task, create isolated worktree

为每个任务创建隔离的worktree

cd "$WORKDIR/repo" git worktree add -b fix/issue-N "$WORKDIR/task-tN" main
undefined
cd "$WORKDIR/repo" git worktree add -b fix/issue-N "$WORKDIR/task-tN" main
undefined

Launch Tmux Sessions

启动Tmux会话

bash
SOCKET="$WORKDIR/orchestrator.sock"
bash
SOCKET="$WORKDIR/orchestrator.sock"

Create session for task

为任务创建会话

tmux -S "$SOCKET" new-session -d -s "task-tN"
tmux -S "$SOCKET" new-session -d -s "task-tN"

Launch Codex (uses gpt-5.2-codex with reasoning_effort=high from ~/.codex/config.toml)

启动Codex(使用~/.codex/config.toml中配置的gpt-5.2-codex with reasoning_effort=high)

Note: Model config is in ~/.codex/config.toml, not CLI flag

注意:模型配置在~/.codex/config.toml中,而非通过CLI参数指定

tmux -S "$SOCKET" send-keys -t "task-tN"
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter

---
tmux -S "$SOCKET" send-keys -t "task-tN"
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter

---

Monitoring & Self-Healing

监控与自愈

Progress Check Script

进度检查脚本

bash
#!/bin/bash
bash
#!/bin/bash

check_progress.sh - Run via heartbeat

check_progress.sh - 通过心跳机制运行

WORKDIR="$1" SOCKET="$WORKDIR/orchestrator.sock" MANIFEST="$WORKDIR/manifest.json" STALL_THRESHOLD_MINS=20
check_session() { local session="$1" local task_id="$2"

Capture recent output

local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)

Check for completion indicators

if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then echo "DONE:$task_id" return 0 fi

Check for errors

if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then echo "ERROR:$task_id" return 1 fi

Check for stall (prompt waiting for input)

if echo "$output" | grep -qE "(? |Continue?|y/n|Press any key)"; then echo "STUCK:$task_id:waiting_for_input" return 2 fi
echo "RUNNING:$task_id" return 0 }
WORKDIR="$1" SOCKET="$WORKDIR/orchestrator.sock" MANIFEST="$WORKDIR/manifest.json" STALL_THRESHOLD_MINS=20
check_session() { local session="$1" local task_id="$2"

捕获近期输出

local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)

检查完成标识

if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then echo "DONE:$task_id" return 0 fi

检查错误

if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then echo "ERROR:$task_id" return 1 fi

检查停滞(等待输入的提示符)

if echo "$output" | grep -qE "(? |Continue?|y/n|Press any key)"; then echo "STUCK:$task_id:waiting_for_input" return 2 fi
echo "RUNNING:$task_id" return 0 }

Check all active sessions

检查所有活跃会话

for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do check_session "$session" "$session" done
undefined
for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do check_session "$session" "$session" done
undefined

Self-Healing Actions

自愈操作

When a task is stuck, the orchestrator should:
  1. Waiting for input → Send appropriate response
    bash
    tmux -S "$SOCKET" send-keys -t "$session" "y" Enter
  2. Error/failure → Capture logs, analyze, retry with fixes
    bash
    # Capture error context
    tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log"
    
    # Kill and restart with error context
    tmux -S "$SOCKET" kill-session -t "$session"
    tmux -S "$SOCKET" new-session -d -s "$session"
    tmux -S "$SOCKET" send-keys -t "$session" \
      "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter
  3. No progress for 20+ mins → Nudge or restart
    bash
    # Check git log for recent commits
    cd "$WORKDIR/$task_id"
    LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null)
    
    # If no commits in threshold, restart
当任务停滞时,编排器应执行以下操作:
  1. 等待输入 → 发送相应响应
    bash
    tmux -S "$SOCKET" send-keys -t "$session" "y" Enter
  2. 错误/失败 → 捕获日志、分析问题、修复后重试
    bash
    # 捕获错误上下文
    tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log"
    
    # 终止会话并携带错误上下文重启
    tmux -S "$SOCKET" kill-session -t "$session"
    tmux -S "$SOCKET" new-session -d -s "$session"
    tmux -S "$SOCKET" send-keys -t "$session" \
      "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter
  3. 20分钟以上无进展 → 提醒或重启
    bash
    # 检查git日志中的近期提交
    cd "$WORKDIR/$task_id"
    LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null)
    
    # 如果在阈值时间内无提交,则重启

Heartbeat Cron Setup

心跳Cron配置

bash
undefined
bash
undefined

Add to cron (every 15 minutes)

添加到cron(每15分钟执行一次)

cron action:add job:{ "label": "orchestrator-heartbeat", "schedule": "*/15 * * * *", "prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself." }

---
cron action:add job:{ "label": "orchestrator-heartbeat", "schedule": "*/15 * * * *", "prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself." }

---

Workflow: Full Orchestration Run

工作流:完整编排运行

Step 1: Analyze & Plan

步骤1:分析与规划

bash
undefined
bash
undefined

1. Fetch issues

1. 获取issues

gh issue list --repo OWNER/REPO --state open --json number,title,body > /tmp/issues.json
gh issue list --repo OWNER/REPO --state open --json number,title,body > /tmp/issues.json

2. Analyze for dependencies (files mentioned, explicit deps)

2. 分析依赖(提及的文件、显式依赖)

Group into phases:

分组为不同阶段:

- Phase 1: Critical/blocking issues (no deps)

- 阶段1:关键/阻塞性issues(无依赖)

- Phase 2: High priority (may depend on Phase 1)

- 阶段2:高优先级issues(可能依赖阶段1)

- Phase 3: Medium/low (depends on earlier phases)

- 阶段3:中/低优先级issues(依赖前期阶段)

3. Within each phase, identify:

3. 在每个阶段内识别:

- Parallel batch: Different files, no deps → run simultaneously

- 并行批次:不同文件、无依赖 → 同时运行

- Serial batch: Same files or explicit deps → run in order

- 串行批次:同一文件或有显式依赖 → 按顺序运行

undefined
undefined

Step 2: Create Manifest

步骤2:创建清单

Write manifest.json with all tasks, dependencies, file mappings.
编写包含所有任务、依赖关系、文件映射的manifest.json。

Step 3: Launch Phase 1

步骤3:启动阶段1

bash
undefined
bash
undefined

Create worktrees for Phase 1 tasks

为阶段1的任务创建worktree

for task in phase1_tasks; do git worktree add -b "fix/issue-$issue" "$WORKDIR/task-$id" main done
for task in phase1_tasks; do git worktree add -b "fix/issue-$issue" "$WORKDIR/task-$id" main done

Launch tmux sessions

启动tmux会话

for task in phase1_parallel_batch; do tmux -S "$SOCKET" new-session -d -s "task-$id" tmux -S "$SOCKET" send-keys -t "task-$id"
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter done
undefined
for task in phase1_parallel_batch; do tmux -S "$SOCKET" new-session -d -s "task-$id" tmux -S "$SOCKET" send-keys -t "task-$id"
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter done
undefined

Step 4: Monitor & Self-Heal

步骤4:监控与自愈

Heartbeat checks every 15 mins:
  1. Poll all sessions
  2. Update manifest with progress
  3. Self-heal stuck tasks
  4. When all Phase N tasks complete → launch Phase N+1
每15分钟执行一次心跳检查:
  1. 轮询所有会话
  2. 更新清单中的进度信息
  3. 对停滞的任务执行自愈操作
  4. 当第N阶段所有任务完成后 → 启动第N+1阶段

Step 5: Create PRs

步骤5:创建PR

bash
undefined
bash
undefined

When task completes successfully

当任务成功完成后

cd "$WORKDIR/task-$id" git push -u origin "fix/issue-$issue" gh pr create --repo OWNER/REPO
--head "fix/issue-$issue"
--title "fix: Issue #$issue - $TITLE"
--body "Closes #$issue
cd "$WORKDIR/task-$id" git push -u origin "fix/issue-$issue" gh pr create --repo OWNER/REPO
--head "fix/issue-$issue"
--title "fix: Issue #$issue - $TITLE"
--body "Closes #$issue

Changes

变更

[Auto-generated by Codex orchestrator]
[由Codex编排器自动生成]

Testing

测试

  • Unit tests pass
  • Manual verification"
undefined
  • 单元测试通过
  • 人工验证"
undefined

Step 6: Cleanup

步骤6:清理

bash
undefined
bash
undefined

After all PRs merged or work complete

所有PR合并或工作完成后

tmux -S "$SOCKET" kill-server cd "$WORKDIR/repo" for task in all_tasks; do git worktree remove "$WORKDIR/task-$id" --force done rm -rf "$WORKDIR"

---
tmux -S "$SOCKET" kill-server cd "$WORKDIR/repo" for task in all_tasks; do git worktree remove "$WORKDIR/task-$id" --force done rm -rf "$WORKDIR"

---

Manifest Status Values

清单状态值

StatusMeaning
pending
Not started yet
blocked
Waiting on dependency
running
Codex session active
stuck
Needs intervention (auto-heal)
error
Failed, needs retry
complete
Done, ready for PR
pr_open
PR created
merged
PR merged

状态含义
pending
尚未启动
blocked
等待依赖完成
running
Codex会话活跃
stuck
需要干预(自动自愈)
error
执行失败,需要重试
complete
执行完成,准备创建PR
pr_open
PR已创建
merged
PR已合并

Example: Security Framework Orchestration

示例:安全框架编排

json
{
  "project": "nuri-security-framework",
  "repo": "jdrhyne/nuri-security-framework",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
        {"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
        {"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
      ]
    },
    {
      "name": "Phase 2: High",
      "tasks": [
        {"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
        {"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
        {"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
        {"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
        {"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
      ]
    }
  ]
}
Parallel execution in Phase 1:
  • t1 and t3 run in parallel (different files)
  • t2 waits for t1 (same file)
Parallel execution in Phase 2:
  • t4, t6, t7 can start together
  • t5 waits for t4, t8 waits for t7

json
{
  "project": "nuri-security-framework",
  "repo": "jdrhyne/nuri-security-framework",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
        {"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
        {"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
      ]
    },
    {
      "name": "Phase 2: High",
      "tasks": [
        {"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
        {"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
        {"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
        {"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
        {"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
      ]
    }
  ]
}
阶段1中的并行执行:
  • t1和t3并行运行(涉及不同文件)
  • t2等待t1完成(涉及同一文件)
阶段2中的并行执行:
  • t4、t6、t7可同时启动
  • t5等待t4完成,t8等待t7完成

Tips

提示

  1. Always use GPT-5.2-codex high for complex work:
    --model gpt-5.2-codex-high
  2. Clear prompts — Include issue number, description, expected outcome, test instructions
  3. Atomic commits — Tell Codex to commit after each logical change
  4. Push early — Push to remote branch so progress isn't lost if session dies
  5. Checkpoint logs — Capture tmux output periodically to files
  6. Phase gates — Don't start Phase N+1 until Phase N is 100% complete
  7. Self-heal aggressively — If stuck >10 mins, intervene automatically
  8. Browser relay limits — If CDP automation is blocked, use iframe batch scraping or manual browser steps

  1. 复杂工作始终使用GPT-5.2-codex high
    --model gpt-5.2-codex-high
  2. 清晰的提示词 — 包含issue编号、描述、预期结果、测试说明
  3. 原子提交 — 告知Codex在每次逻辑变更后提交
  4. 尽早推送 — 推送到远程分支,避免会话异常丢失进度
  5. 检查点日志 — 定期将tmux输出捕获到文件中
  6. 阶段闸门 — 确保第N阶段100%完成后再启动第N+1阶段
  7. 主动自愈 — 如果任务停滞超过10分钟,自动进行干预
  8. 浏览器中继限制 — 如果CDP自动化被阻止,使用iframe批量抓取或手动浏览器步骤

Integration with Other Skills

与其他技能集成

  • senior-engineering: Load for build principles and quality gates
  • coding-agent: Reference for Codex CLI patterns
  • github: Use for PR creation, issue management

  • senior-engineering:加载该技能以遵循构建原则和质量闸门
  • coding-agent:参考Codex CLI模式
  • github:用于PR创建、issue管理

Lessons Learned (2026-01-17)

经验总结(2026-01-17)

Codex Sandbox Limitations

Codex沙箱限制

When using
codex exec --full-auto
, the sandbox:
  • No network access
    git push
    fails with "Could not resolve host"
  • Limited filesystem — Can't write to paths like
    ~/nuri_workspace
使用
codex exec --full-auto
时,沙箱存在以下限制:
  • 无网络访问
    git push
    会因"无法解析主机"失败
  • 有限的文件系统 — 无法写入
    ~/nuri_workspace
    这类路径

Heartbeat Detection Improvements

心跳检测优化

The heartbeat should check for:
  1. Shell prompt idle — If tmux pane shows
    username@hostname path %
    , worker is done
  2. Unpushed commits
    git log @{u}.. --oneline
    shows commits not on remote
  3. Push failures — Look for "Could not resolve host" in output
When detected, the orchestrator (not the worker) should:
  1. Push the commit from outside the sandbox
  2. Create the PR via
    gh pr create
  3. Update manifest and notify
心跳机制应检查以下内容:
  1. Shell提示符空闲 — 如果tmux面板显示
    username@hostname path %
    ,则任务已完成
  2. 未推送的提交
    git log @{u}.. --oneline
    显示未推送到远程的提交
  3. 推送失败 — 在输出中查找"无法解析主机"
检测到上述情况时,应由编排器(而非任务执行器)执行以下操作:
  1. 从沙箱外部推送提交
  2. 通过
    gh pr create
    创建PR
  3. 更新清单并通知

Recommended Pattern

推荐模式

bash
undefined
bash
undefined

In heartbeat, for each task:

在心跳脚本中,针对每个任务:

cd /tmp/orchestrator-*/task-tN if tmux capture-pane shows shell prompt; then

Worker finished, check for unpushed work

if git log @{u}.. --oneline | grep -q .; then git push -u origin HEAD gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main fi fi
undefined
cd /tmp/orchestrator-*/task-tN if tmux capture-pane shows shell prompt; then

任务执行器已完成,检查是否有未推送的工作

if git log @{u}.. --oneline | grep -q .; then git push -u origin HEAD gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main fi fi
undefined