task-orchestrator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Task Orchestrator

任务编排器

Autonomous orchestration of multi-agent builds using tmux + Codex with self-healing monitoring.

Load the senior-engineering skill alongside this one for engineering principles.

基于tmux + Codex的自主多Agent构建编排，附带自愈监控功能。

请同时加载senior-engineering技能以遵循工程原则。

Core Concepts

核心概念

1. Task Manifest

1. 任务清单

A JSON file defining all tasks, their dependencies, files touched, and status.

json

{
  "project": "project-name",
  "repo": "owner/repo",
  "workdir": "/path/to/worktrees",
  "created": "2026-01-17T00:00:00Z",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {
          "id": "t1",
          "issue": 1,
          "title": "Fix X",
          "files": ["src/foo.js"],
          "dependsOn": [],
          "status": "pending",
          "worktree": null,
          "tmuxSession": null,
          "startedAt": null,
          "lastProgress": null,
          "completedAt": null,
          "prNumber": null
        }
      ]
    }
  ]
}

一个定义所有任务、任务依赖、涉及文件以及状态的JSON文件。

json

{
  "project": "project-name",
  "repo": "owner/repo",
  "workdir": "/path/to/worktrees",
  "created": "2026-01-17T00:00:00Z",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {
          "id": "t1",
          "issue": 1,
          "title": "Fix X",
          "files": ["src/foo.js"],
          "dependsOn": [],
          "status": "pending",
          "worktree": null,
          "tmuxSession": null,
          "startedAt": null,
          "lastProgress": null,
          "completedAt": null,
          "prNumber": null
        }
      ]
    }
  ]
}

2. Dependency Rules

2. 依赖规则

Same file = sequential — Tasks touching the same file must run in order or merge
Different files = parallel — Independent tasks can run simultaneously
Explicit depends = wait —
```
dependsOn
```
array enforces ordering
Phase gates — Next phase waits for current phase completion

同一文件 = 串行执行 — 涉及同一文件的任务必须按顺序执行或合并
不同文件 = 并行执行 — 独立任务可同时运行
显式依赖 = 等待 —
```
dependsOn
```
数组强制指定执行顺序
阶段闸门 — 下一阶段需等待当前阶段完成后再启动

3. Execution Model

3. 执行模型

Each task gets its own git worktree (isolated branch)
Each task runs in its own tmux session
Use Codex with --yolo for autonomous execution
Model: GPT-5.2-codex high (configurable)

每个任务拥有独立的git worktree（隔离分支）
每个任务在独立的tmux会话中运行
使用Codex --yolo模式实现自主执行
模型：GPT-5.2-codex high（可配置）

Setup Commands

设置命令

Initialize Orchestration

初始化编排

bash

undefined

bash

undefined

1. Create working directory

1. 创建工作目录

WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)" mkdir -p "$WORKDIR"

2. Clone repo for worktrees

2. 克隆仓库用于创建worktree

git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo" cd "$WORKDIR/repo"

3. Create tmux socket

3. 创建tmux套接字

SOCKET="$WORKDIR/orchestrator.sock"

4. Initialize manifest

4. 初始化清单

cat > "$WORKDIR/manifest.json" << 'EOF' { "project": "PROJECT_NAME", "repo": "OWNER/REPO", "workdir": "WORKDIR_PATH", "socket": "SOCKET_PATH", "created": "TIMESTAMP", "model": "gpt-5.2-codex", "modelTier": "high", "phases": [] } EOF

undefined

undefined

Analyze GitHub Issues for Dependencies

分析GitHub Issues以识别依赖

bash

undefined

bash

undefined

Fetch all open issues

获取所有未关闭的issues

gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json

Group by files mentioned in issue body

根据issue正文中提到的文件进行分组

Tasks touching same files should serialize

涉及同一文件的任务应串行执行

undefined

undefined

Create Worktrees

创建Worktree

bash

undefined

bash

undefined

For each task, create isolated worktree

为每个任务创建隔离的worktree

cd "$WORKDIR/repo" git worktree add -b fix/issue-N "$WORKDIR/task-tN" main

undefined

cd "$WORKDIR/repo" git worktree add -b fix/issue-N "$WORKDIR/task-tN" main

undefined

Launch Tmux Sessions

启动Tmux会话

bash

SOCKET="$WORKDIR/orchestrator.sock"

bash

SOCKET="$WORKDIR/orchestrator.sock"

Create session for task

为任务创建会话

tmux -S "$SOCKET" new-session -d -s "task-tN"

Launch Codex (uses gpt-5.2-codex with reasoning_effort=high from ~/.codex/config.toml)

启动Codex（使用~/.codex/config.toml中配置的gpt-5.2-codex with reasoning_effort=high）

Note: Model config is in ~/.codex/config.toml, not CLI flag

注意：模型配置在~/.codex/config.toml中，而非通过CLI参数指定

tmux -S "$SOCKET" send-keys -t "task-tN"
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter

---

tmux -S "$SOCKET" send-keys -t "task-tN"
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter

---

Monitoring & Self-Healing

监控与自愈

Progress Check Script

进度检查脚本

bash

#!/bin/bash

bash

#!/bin/bash

check_progress.sh - Run via heartbeat

check_progress.sh - 通过心跳机制运行

WORKDIR="$1" SOCKET="$WORKDIR/orchestrator.sock" MANIFEST="$WORKDIR/manifest.json" STALL_THRESHOLD_MINS=20

check_session() { local session="$1" local task_id="$2"

Capture recent output

local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)

Check for completion indicators

if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then echo "DONE:$task_id" return 0 fi

Check for errors

if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then echo "ERROR:$task_id" return 1 fi

Check for stall (prompt waiting for input)

if echo "$output" | grep -qE "(? |Continue?|y/n|Press any key)"; then echo "STUCK:$task_id:waiting_for_input" return 2 fi

echo "RUNNING:$task_id" return 0 }

WORKDIR="$1" SOCKET="$WORKDIR/orchestrator.sock" MANIFEST="$WORKDIR/manifest.json" STALL_THRESHOLD_MINS=20

check_session() { local session="$1" local task_id="$2"

捕获近期输出

local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)

检查完成标识

if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then echo "DONE:$task_id" return 0 fi

检查错误

if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then echo "ERROR:$task_id" return 1 fi

检查停滞（等待输入的提示符）

if echo "$output" | grep -qE "(? |Continue?|y/n|Press any key)"; then echo "STUCK:$task_id:waiting_for_input" return 2 fi

echo "RUNNING:$task_id" return 0 }

Check all active sessions

检查所有活跃会话

for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do check_session "$session" "$session" done

undefined

for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do check_session "$session" "$session" done

undefined

Self-Healing Actions

自愈操作

When a task is stuck, the orchestrator should:

Waiting for input → Send appropriate response

bash

tmux -S "$SOCKET" send-keys -t "$session" "y" Enter

Error/failure → Capture logs, analyze, retry with fixes

bash

# Capture error context
tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log"

# Kill and restart with error context
tmux -S "$SOCKET" kill-session -t "$session"
tmux -S "$SOCKET" new-session -d -s "$session"
tmux -S "$SOCKET" send-keys -t "$session" \
  "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter

No progress for 20+ mins → Nudge or restart

bash

# Check git log for recent commits
cd "$WORKDIR/$task_id"
LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null)

# If no commits in threshold, restart

当任务停滞时，编排器应执行以下操作：

等待输入 → 发送相应响应

bash

tmux -S "$SOCKET" send-keys -t "$session" "y" Enter

错误/失败 → 捕获日志、分析问题、修复后重试

bash

# 捕获错误上下文
tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log"

# 终止会话并携带错误上下文重启
tmux -S "$SOCKET" kill-session -t "$session"
tmux -S "$SOCKET" new-session -d -s "$session"
tmux -S "$SOCKET" send-keys -t "$session" \
  "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter

20分钟以上无进展 → 提醒或重启

bash

# 检查git日志中的近期提交
cd "$WORKDIR/$task_id"
LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null)

# 如果在阈值时间内无提交，则重启

Heartbeat Cron Setup

心跳Cron配置

bash

undefined

bash

undefined

Add to cron (every 15 minutes)

添加到cron（每15分钟执行一次）

cron action:add job:{ "label": "orchestrator-heartbeat", "schedule": "*/15 * * * *", "prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself." }

---

---

Workflow: Full Orchestration Run

工作流：完整编排运行

Step 1: Analyze & Plan

步骤1：分析与规划

bash

undefined

bash

undefined

1. Fetch issues

1. 获取issues

gh issue list --repo OWNER/REPO --state open --json number,title,body > /tmp/issues.json

2. Analyze for dependencies (files mentioned, explicit deps)

2. 分析依赖（提及的文件、显式依赖）

Group into phases:

分组为不同阶段：

- Phase 1: Critical/blocking issues (no deps)

- 阶段1：关键/阻塞性issues（无依赖）

- Phase 2: High priority (may depend on Phase 1)

- 阶段2：高优先级issues（可能依赖阶段1）

- Phase 3: Medium/low (depends on earlier phases)

- 阶段3：中/低优先级issues（依赖前期阶段）

3. Within each phase, identify:

3. 在每个阶段内识别：

- Parallel batch: Different files, no deps → run simultaneously

- 并行批次：不同文件、无依赖 → 同时运行

- Serial batch: Same files or explicit deps → run in order

- 串行批次：同一文件或有显式依赖 → 按顺序运行

undefined

undefined

Step 2: Create Manifest

步骤2：创建清单

Write manifest.json with all tasks, dependencies, file mappings.

编写包含所有任务、依赖关系、文件映射的manifest.json。

Step 3: Launch Phase 1

步骤3：启动阶段1

bash

undefined

bash

undefined

Create worktrees for Phase 1 tasks

为阶段1的任务创建worktree

for task in phase1_tasks; do git worktree add -b "fix/issue-$issue" "$WORKDIR/task-$id" main done

Launch tmux sessions

启动tmux会话

for task in phase1_parallel_batch; do tmux -S "$SOCKET" new-session -d -s "task-$id" tmux -S "$SOCKET" send-keys -t "task-$id"
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter done

undefined

undefined

Step 4: Monitor & Self-Heal

步骤4：监控与自愈

Heartbeat checks every 15 mins:

Poll all sessions
Update manifest with progress
Self-heal stuck tasks
When all Phase N tasks complete → launch Phase N+1

每15分钟执行一次心跳检查：

轮询所有会话
更新清单中的进度信息
对停滞的任务执行自愈操作
当第N阶段所有任务完成后 → 启动第N+1阶段

Step 5: Create PRs

步骤5：创建PR

bash

undefined

bash

undefined

When task completes successfully

当任务成功完成后

cd "$WORKDIR/task-$id" git push -u origin "fix/issue-$issue" gh pr create --repo OWNER/REPO
--head "fix/issue-$issue"
--title "fix: Issue #$issue - $TITLE"
--body "Closes #$issue

Changes

变更

[Auto-generated by Codex orchestrator]

[由Codex编排器自动生成]

Testing

测试

Unit tests pass
Manual verification"

undefined

单元测试通过
人工验证"

undefined

Step 6: Cleanup

步骤6：清理

bash

undefined

bash

undefined

After all PRs merged or work complete

所有PR合并或工作完成后

tmux -S "$SOCKET" kill-server cd "$WORKDIR/repo" for task in all_tasks; do git worktree remove "$WORKDIR/task-$id" --force done rm -rf "$WORKDIR"

---

tmux -S "$SOCKET" kill-server cd "$WORKDIR/repo" for task in all_tasks; do git worktree remove "$WORKDIR/task-$id" --force done rm -rf "$WORKDIR"

---

Manifest Status Values

清单状态值

Status	Meaning
`pending`	Not started yet
`blocked`	Waiting on dependency
`running`	Codex session active
`stuck`	Needs intervention (auto-heal)
`error`	Failed, needs retry
`complete`	Done, ready for PR
`pr_open`	PR created
`merged`	PR merged

状态	含义
`pending`	尚未启动
`blocked`	等待依赖完成
`running`	Codex会话活跃
`stuck`	需要干预（自动自愈）
`error`	执行失败，需要重试
`complete`	执行完成，准备创建PR
`pr_open`	PR已创建
`merged`	PR已合并

Example: Security Framework Orchestration

示例：安全框架编排

json

{
  "project": "nuri-security-framework",
  "repo": "jdrhyne/nuri-security-framework",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
        {"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
        {"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
      ]
    },
    {
      "name": "Phase 2: High",
      "tasks": [
        {"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
        {"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
        {"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
        {"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
        {"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
      ]
    }
  ]
}

Parallel execution in Phase 1:

t1 and t3 run in parallel (different files)
t2 waits for t1 (same file)

Parallel execution in Phase 2:

t4, t6, t7 can start together
t5 waits for t4, t8 waits for t7

json

{
  "project": "nuri-security-framework",
  "repo": "jdrhyne/nuri-security-framework",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
        {"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
        {"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
      ]
    },
    {
      "name": "Phase 2: High",
      "tasks": [
        {"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
        {"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
        {"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
        {"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
        {"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
      ]
    }
  ]
}

阶段1中的并行执行：

t1和t3并行运行（涉及不同文件）
t2等待t1完成（涉及同一文件）

阶段2中的并行执行：

t4、t6、t7可同时启动
t5等待t4完成，t8等待t7完成

Tips

提示

Always use GPT-5.2-codex high for complex work:
```
--model gpt-5.2-codex-high
```
Clear prompts — Include issue number, description, expected outcome, test instructions
Atomic commits — Tell Codex to commit after each logical change
Push early — Push to remote branch so progress isn't lost if session dies
Checkpoint logs — Capture tmux output periodically to files
Phase gates — Don't start Phase N+1 until Phase N is 100% complete
Self-heal aggressively — If stuck >10 mins, intervene automatically
Browser relay limits — If CDP automation is blocked, use iframe batch scraping or manual browser steps

复杂工作始终使用GPT-5.2-codex high：
```
--model gpt-5.2-codex-high
```
清晰的提示词 — 包含issue编号、描述、预期结果、测试说明
原子提交 — 告知Codex在每次逻辑变更后提交
尽早推送 — 推送到远程分支，避免会话异常丢失进度
检查点日志 — 定期将tmux输出捕获到文件中
阶段闸门 — 确保第N阶段100%完成后再启动第N+1阶段
主动自愈 — 如果任务停滞超过10分钟，自动进行干预
浏览器中继限制 — 如果CDP自动化被阻止，使用iframe批量抓取或手动浏览器步骤

Integration with Other Skills

与其他技能集成

senior-engineering: Load for build principles and quality gates
coding-agent: Reference for Codex CLI patterns
github: Use for PR creation, issue management

senior-engineering：加载该技能以遵循构建原则和质量闸门
coding-agent：参考Codex CLI模式
github：用于PR创建、issue管理

Lessons Learned (2026-01-17)

经验总结（2026-01-17）

Codex Sandbox Limitations

Codex沙箱限制

When using

codex exec --full-auto

, the sandbox:

No network access —
```
git push
```
fails with "Could not resolve host"
Limited filesystem — Can't write to paths like
```
~/nuri_workspace
```

使用

codex exec --full-auto

时，沙箱存在以下限制：

无网络访问 —
```
git push
```
会因"无法解析主机"失败
有限的文件系统 — 无法写入
```
~/nuri_workspace
```
这类路径

Heartbeat Detection Improvements

心跳检测优化

The heartbeat should check for:

Shell prompt idle — If tmux pane shows
```
username@hostname path %
```
, worker is done
Unpushed commits —
```
git log @{u}.. --oneline
```
shows commits not on remote
Push failures — Look for "Could not resolve host" in output

When detected, the orchestrator (not the worker) should:

Push the commit from outside the sandbox
Create the PR via
```
gh pr create
```
Update manifest and notify

心跳机制应检查以下内容：

Shell提示符空闲 — 如果tmux面板显示
```
username@hostname path %
```
，则任务已完成
未推送的提交 —
```
git log @{u}.. --oneline
```
显示未推送到远程的提交
推送失败 — 在输出中查找"无法解析主机"

检测到上述情况时，应由编排器（而非任务执行器）执行以下操作：

从沙箱外部推送提交
通过
```
gh pr create
```
创建PR
更新清单并通知

Recommended Pattern

In heartbeat, for each task:

在心跳脚本中，针对每个任务：

cd /tmp/orchestrator-*/task-tN if tmux capture-pane shows shell prompt; then

Worker finished, check for unpushed work

if git log @{u}.. --oneline | grep -q .; then git push -u origin HEAD gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main fi fi

undefined

cd /tmp/orchestrator-*/task-tN if tmux capture-pane shows shell prompt; then

任务执行器已完成，检查是否有未推送的工作

if git log @{u}.. --oneline | grep -q .; then git push -u origin HEAD gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main fi fi

undefined