task-orchestrator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTask Orchestrator
任务编排器
Autonomous orchestration of multi-agent builds using tmux + Codex with self-healing monitoring.
Load the senior-engineering skill alongside this one for engineering principles.
基于tmux + Codex的自主多Agent构建编排,附带自愈监控功能。
请同时加载senior-engineering技能以遵循工程原则。
Core Concepts
核心概念
1. Task Manifest
1. 任务清单
A JSON file defining all tasks, their dependencies, files touched, and status.
json
{
"project": "project-name",
"repo": "owner/repo",
"workdir": "/path/to/worktrees",
"created": "2026-01-17T00:00:00Z",
"model": "gpt-5.2-codex",
"modelTier": "high",
"phases": [
{
"name": "Phase 1: Critical",
"tasks": [
{
"id": "t1",
"issue": 1,
"title": "Fix X",
"files": ["src/foo.js"],
"dependsOn": [],
"status": "pending",
"worktree": null,
"tmuxSession": null,
"startedAt": null,
"lastProgress": null,
"completedAt": null,
"prNumber": null
}
]
}
]
}一个定义所有任务、任务依赖、涉及文件以及状态的JSON文件。
json
{
"project": "project-name",
"repo": "owner/repo",
"workdir": "/path/to/worktrees",
"created": "2026-01-17T00:00:00Z",
"model": "gpt-5.2-codex",
"modelTier": "high",
"phases": [
{
"name": "Phase 1: Critical",
"tasks": [
{
"id": "t1",
"issue": 1,
"title": "Fix X",
"files": ["src/foo.js"],
"dependsOn": [],
"status": "pending",
"worktree": null,
"tmuxSession": null,
"startedAt": null,
"lastProgress": null,
"completedAt": null,
"prNumber": null
}
]
}
]
}2. Dependency Rules
2. 依赖规则
- Same file = sequential — Tasks touching the same file must run in order or merge
- Different files = parallel — Independent tasks can run simultaneously
- Explicit depends = wait — array enforces ordering
dependsOn - Phase gates — Next phase waits for current phase completion
- 同一文件 = 串行执行 — 涉及同一文件的任务必须按顺序执行或合并
- 不同文件 = 并行执行 — 独立任务可同时运行
- 显式依赖 = 等待 — 数组强制指定执行顺序
dependsOn - 阶段闸门 — 下一阶段需等待当前阶段完成后再启动
3. Execution Model
3. 执行模型
- Each task gets its own git worktree (isolated branch)
- Each task runs in its own tmux session
- Use Codex with --yolo for autonomous execution
- Model: GPT-5.2-codex high (configurable)
- 每个任务拥有独立的git worktree(隔离分支)
- 每个任务在独立的tmux会话中运行
- 使用Codex --yolo模式实现自主执行
- 模型:GPT-5.2-codex high(可配置)
Setup Commands
设置命令
Initialize Orchestration
初始化编排
bash
undefinedbash
undefined1. Create working directory
1. 创建工作目录
WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)"
mkdir -p "$WORKDIR"
WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)"
mkdir -p "$WORKDIR"
2. Clone repo for worktrees
2. 克隆仓库用于创建worktree
git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo"
cd "$WORKDIR/repo"
git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo"
cd "$WORKDIR/repo"
3. Create tmux socket
3. 创建tmux套接字
SOCKET="$WORKDIR/orchestrator.sock"
SOCKET="$WORKDIR/orchestrator.sock"
4. Initialize manifest
4. 初始化清单
cat > "$WORKDIR/manifest.json" << 'EOF'
{
"project": "PROJECT_NAME",
"repo": "OWNER/REPO",
"workdir": "WORKDIR_PATH",
"socket": "SOCKET_PATH",
"created": "TIMESTAMP",
"model": "gpt-5.2-codex",
"modelTier": "high",
"phases": []
}
EOF
undefinedcat > "$WORKDIR/manifest.json" << 'EOF'
{
"project": "PROJECT_NAME",
"repo": "OWNER/REPO",
"workdir": "WORKDIR_PATH",
"socket": "SOCKET_PATH",
"created": "TIMESTAMP",
"model": "gpt-5.2-codex",
"modelTier": "high",
"phases": []
}
EOF
undefinedAnalyze GitHub Issues for Dependencies
分析GitHub Issues以识别依赖
bash
undefinedbash
undefinedFetch all open issues
获取所有未关闭的issues
gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json
gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json
Group by files mentioned in issue body
根据issue正文中提到的文件进行分组
Tasks touching same files should serialize
涉及同一文件的任务应串行执行
undefinedundefinedCreate Worktrees
创建Worktree
bash
undefinedbash
undefinedFor each task, create isolated worktree
为每个任务创建隔离的worktree
cd "$WORKDIR/repo"
git worktree add -b fix/issue-N "$WORKDIR/task-tN" main
undefinedcd "$WORKDIR/repo"
git worktree add -b fix/issue-N "$WORKDIR/task-tN" main
undefinedLaunch Tmux Sessions
启动Tmux会话
bash
SOCKET="$WORKDIR/orchestrator.sock"bash
SOCKET="$WORKDIR/orchestrator.sock"Create session for task
为任务创建会话
tmux -S "$SOCKET" new-session -d -s "task-tN"
tmux -S "$SOCKET" new-session -d -s "task-tN"
Launch Codex (uses gpt-5.2-codex with reasoning_effort=high from ~/.codex/config.toml)
启动Codex(使用~/.codex/config.toml中配置的gpt-5.2-codex with reasoning_effort=high)
Note: Model config is in ~/.codex/config.toml, not CLI flag
注意:模型配置在~/.codex/config.toml中,而非通过CLI参数指定
tmux -S "$SOCKET" send-keys -t "task-tN"
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter
---tmux -S "$SOCKET" send-keys -t "task-tN"
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter
---Monitoring & Self-Healing
监控与自愈
Progress Check Script
进度检查脚本
bash
#!/bin/bashbash
#!/bin/bashcheck_progress.sh - Run via heartbeat
check_progress.sh - 通过心跳机制运行
WORKDIR="$1"
SOCKET="$WORKDIR/orchestrator.sock"
MANIFEST="$WORKDIR/manifest.json"
STALL_THRESHOLD_MINS=20
check_session() {
local session="$1"
local task_id="$2"
Capture recent output
local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)
Check for completion indicators
if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then
echo "DONE:$task_id"
return 0
fi
Check for errors
if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then
echo "ERROR:$task_id"
return 1
fi
Check for stall (prompt waiting for input)
if echo "$output" | grep -qE "(? |Continue?|y/n|Press any key)"; then
echo "STUCK:$task_id:waiting_for_input"
return 2
fi
echo "RUNNING:$task_id"
return 0
}
WORKDIR="$1"
SOCKET="$WORKDIR/orchestrator.sock"
MANIFEST="$WORKDIR/manifest.json"
STALL_THRESHOLD_MINS=20
check_session() {
local session="$1"
local task_id="$2"
捕获近期输出
local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)
检查完成标识
if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then
echo "DONE:$task_id"
return 0
fi
检查错误
if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then
echo "ERROR:$task_id"
return 1
fi
检查停滞(等待输入的提示符)
if echo "$output" | grep -qE "(? |Continue?|y/n|Press any key)"; then
echo "STUCK:$task_id:waiting_for_input"
return 2
fi
echo "RUNNING:$task_id"
return 0
}
Check all active sessions
检查所有活跃会话
for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do
check_session "$session" "$session"
done
undefinedfor session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do
check_session "$session" "$session"
done
undefinedSelf-Healing Actions
自愈操作
When a task is stuck, the orchestrator should:
-
Waiting for input → Send appropriate responsebash
tmux -S "$SOCKET" send-keys -t "$session" "y" Enter -
Error/failure → Capture logs, analyze, retry with fixesbash
# Capture error context tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log" # Kill and restart with error context tmux -S "$SOCKET" kill-session -t "$session" tmux -S "$SOCKET" new-session -d -s "$session" tmux -S "$SOCKET" send-keys -t "$session" \ "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter -
No progress for 20+ mins → Nudge or restartbash
# Check git log for recent commits cd "$WORKDIR/$task_id" LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null) # If no commits in threshold, restart
当任务停滞时,编排器应执行以下操作:
-
等待输入 → 发送相应响应bash
tmux -S "$SOCKET" send-keys -t "$session" "y" Enter -
错误/失败 → 捕获日志、分析问题、修复后重试bash
# 捕获错误上下文 tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log" # 终止会话并携带错误上下文重启 tmux -S "$SOCKET" kill-session -t "$session" tmux -S "$SOCKET" new-session -d -s "$session" tmux -S "$SOCKET" send-keys -t "$session" \ "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter -
20分钟以上无进展 → 提醒或重启bash
# 检查git日志中的近期提交 cd "$WORKDIR/$task_id" LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null) # 如果在阈值时间内无提交,则重启
Heartbeat Cron Setup
心跳Cron配置
bash
undefinedbash
undefinedAdd to cron (every 15 minutes)
添加到cron(每15分钟执行一次)
cron action:add job:{
"label": "orchestrator-heartbeat",
"schedule": "*/15 * * * *",
"prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself."
}
---cron action:add job:{
"label": "orchestrator-heartbeat",
"schedule": "*/15 * * * *",
"prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself."
}
---Workflow: Full Orchestration Run
工作流:完整编排运行
Step 1: Analyze & Plan
步骤1:分析与规划
bash
undefinedbash
undefined1. Fetch issues
1. 获取issues
gh issue list --repo OWNER/REPO --state open --json number,title,body > /tmp/issues.json
gh issue list --repo OWNER/REPO --state open --json number,title,body > /tmp/issues.json
2. Analyze for dependencies (files mentioned, explicit deps)
2. 分析依赖(提及的文件、显式依赖)
Group into phases:
分组为不同阶段:
- Phase 1: Critical/blocking issues (no deps)
- 阶段1:关键/阻塞性issues(无依赖)
- Phase 2: High priority (may depend on Phase 1)
- 阶段2:高优先级issues(可能依赖阶段1)
- Phase 3: Medium/low (depends on earlier phases)
- 阶段3:中/低优先级issues(依赖前期阶段)
3. Within each phase, identify:
3. 在每个阶段内识别:
- Parallel batch: Different files, no deps → run simultaneously
- 并行批次:不同文件、无依赖 → 同时运行
- Serial batch: Same files or explicit deps → run in order
- 串行批次:同一文件或有显式依赖 → 按顺序运行
undefinedundefinedStep 2: Create Manifest
步骤2:创建清单
Write manifest.json with all tasks, dependencies, file mappings.
编写包含所有任务、依赖关系、文件映射的manifest.json。
Step 3: Launch Phase 1
步骤3:启动阶段1
bash
undefinedbash
undefinedCreate worktrees for Phase 1 tasks
为阶段1的任务创建worktree
for task in phase1_tasks; do
git worktree add -b "fix/issue-$issue" "$WORKDIR/task-$id" main
done
for task in phase1_tasks; do
git worktree add -b "fix/issue-$issue" "$WORKDIR/task-$id" main
done
Launch tmux sessions
启动tmux会话
for task in phase1_parallel_batch; do
tmux -S "$SOCKET" new-session -d -s "task-$id"
tmux -S "$SOCKET" send-keys -t "task-$id"
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter done
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter done
undefinedfor task in phase1_parallel_batch; do
tmux -S "$SOCKET" new-session -d -s "task-$id"
tmux -S "$SOCKET" send-keys -t "task-$id"
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter done
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter done
undefinedStep 4: Monitor & Self-Heal
步骤4:监控与自愈
Heartbeat checks every 15 mins:
- Poll all sessions
- Update manifest with progress
- Self-heal stuck tasks
- When all Phase N tasks complete → launch Phase N+1
每15分钟执行一次心跳检查:
- 轮询所有会话
- 更新清单中的进度信息
- 对停滞的任务执行自愈操作
- 当第N阶段所有任务完成后 → 启动第N+1阶段
Step 5: Create PRs
步骤5:创建PR
bash
undefinedbash
undefinedWhen task completes successfully
当任务成功完成后
cd "$WORKDIR/task-$id"
git push -u origin "fix/issue-$issue"
gh pr create --repo OWNER/REPO
--head "fix/issue-$issue"
--title "fix: Issue #$issue - $TITLE"
--body "Closes #$issue
--head "fix/issue-$issue"
--title "fix: Issue #$issue - $TITLE"
--body "Closes #$issue
cd "$WORKDIR/task-$id"
git push -u origin "fix/issue-$issue"
gh pr create --repo OWNER/REPO
--head "fix/issue-$issue"
--title "fix: Issue #$issue - $TITLE"
--body "Closes #$issue
--head "fix/issue-$issue"
--title "fix: Issue #$issue - $TITLE"
--body "Closes #$issue
Changes
变更
[Auto-generated by Codex orchestrator]
[由Codex编排器自动生成]
Testing
测试
- Unit tests pass
- Manual verification"
undefined- 单元测试通过
- 人工验证"
undefinedStep 6: Cleanup
步骤6:清理
bash
undefinedbash
undefinedAfter all PRs merged or work complete
所有PR合并或工作完成后
tmux -S "$SOCKET" kill-server
cd "$WORKDIR/repo"
for task in all_tasks; do
git worktree remove "$WORKDIR/task-$id" --force
done
rm -rf "$WORKDIR"
---tmux -S "$SOCKET" kill-server
cd "$WORKDIR/repo"
for task in all_tasks; do
git worktree remove "$WORKDIR/task-$id" --force
done
rm -rf "$WORKDIR"
---Manifest Status Values
清单状态值
| Status | Meaning |
|---|---|
| Not started yet |
| Waiting on dependency |
| Codex session active |
| Needs intervention (auto-heal) |
| Failed, needs retry |
| Done, ready for PR |
| PR created |
| PR merged |
| 状态 | 含义 |
|---|---|
| 尚未启动 |
| 等待依赖完成 |
| Codex会话活跃 |
| 需要干预(自动自愈) |
| 执行失败,需要重试 |
| 执行完成,准备创建PR |
| PR已创建 |
| PR已合并 |
Example: Security Framework Orchestration
示例:安全框架编排
json
{
"project": "nuri-security-framework",
"repo": "jdrhyne/nuri-security-framework",
"phases": [
{
"name": "Phase 1: Critical",
"tasks": [
{"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
{"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
{"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
]
},
{
"name": "Phase 2: High",
"tasks": [
{"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
{"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
{"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
{"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
{"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
]
}
]
}Parallel execution in Phase 1:
- t1 and t3 run in parallel (different files)
- t2 waits for t1 (same file)
Parallel execution in Phase 2:
- t4, t6, t7 can start together
- t5 waits for t4, t8 waits for t7
json
{
"project": "nuri-security-framework",
"repo": "jdrhyne/nuri-security-framework",
"phases": [
{
"name": "Phase 1: Critical",
"tasks": [
{"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
{"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
{"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
]
},
{
"name": "Phase 2: High",
"tasks": [
{"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
{"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
{"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
{"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
{"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
]
}
]
}阶段1中的并行执行:
- t1和t3并行运行(涉及不同文件)
- t2等待t1完成(涉及同一文件)
阶段2中的并行执行:
- t4、t6、t7可同时启动
- t5等待t4完成,t8等待t7完成
Tips
提示
- Always use GPT-5.2-codex high for complex work:
--model gpt-5.2-codex-high - Clear prompts — Include issue number, description, expected outcome, test instructions
- Atomic commits — Tell Codex to commit after each logical change
- Push early — Push to remote branch so progress isn't lost if session dies
- Checkpoint logs — Capture tmux output periodically to files
- Phase gates — Don't start Phase N+1 until Phase N is 100% complete
- Self-heal aggressively — If stuck >10 mins, intervene automatically
- Browser relay limits — If CDP automation is blocked, use iframe batch scraping or manual browser steps
- 复杂工作始终使用GPT-5.2-codex high:
--model gpt-5.2-codex-high - 清晰的提示词 — 包含issue编号、描述、预期结果、测试说明
- 原子提交 — 告知Codex在每次逻辑变更后提交
- 尽早推送 — 推送到远程分支,避免会话异常丢失进度
- 检查点日志 — 定期将tmux输出捕获到文件中
- 阶段闸门 — 确保第N阶段100%完成后再启动第N+1阶段
- 主动自愈 — 如果任务停滞超过10分钟,自动进行干预
- 浏览器中继限制 — 如果CDP自动化被阻止,使用iframe批量抓取或手动浏览器步骤
Integration with Other Skills
与其他技能集成
- senior-engineering: Load for build principles and quality gates
- coding-agent: Reference for Codex CLI patterns
- github: Use for PR creation, issue management
- senior-engineering:加载该技能以遵循构建原则和质量闸门
- coding-agent:参考Codex CLI模式
- github:用于PR创建、issue管理
Lessons Learned (2026-01-17)
经验总结(2026-01-17)
Codex Sandbox Limitations
Codex沙箱限制
When using , the sandbox:
codex exec --full-auto- No network access — fails with "Could not resolve host"
git push - Limited filesystem — Can't write to paths like
~/nuri_workspace
使用时,沙箱存在以下限制:
codex exec --full-auto- 无网络访问 — 会因"无法解析主机"失败
git push - 有限的文件系统 — 无法写入这类路径
~/nuri_workspace
Heartbeat Detection Improvements
心跳检测优化
The heartbeat should check for:
- Shell prompt idle — If tmux pane shows , worker is done
username@hostname path % - Unpushed commits — shows commits not on remote
git log @{u}.. --oneline - Push failures — Look for "Could not resolve host" in output
When detected, the orchestrator (not the worker) should:
- Push the commit from outside the sandbox
- Create the PR via
gh pr create - Update manifest and notify
心跳机制应检查以下内容:
- Shell提示符空闲 — 如果tmux面板显示,则任务已完成
username@hostname path % - 未推送的提交 — 显示未推送到远程的提交
git log @{u}.. --oneline - 推送失败 — 在输出中查找"无法解析主机"
检测到上述情况时,应由编排器(而非任务执行器)执行以下操作:
- 从沙箱外部推送提交
- 通过创建PR
gh pr create - 更新清单并通知
Recommended Pattern
推荐模式
bash
undefinedbash
undefinedIn heartbeat, for each task:
在心跳脚本中,针对每个任务:
cd /tmp/orchestrator-*/task-tN
if tmux capture-pane shows shell prompt; then
Worker finished, check for unpushed work
if git log @{u}.. --oneline | grep -q .; then
git push -u origin HEAD
gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main
fi
fi
undefinedcd /tmp/orchestrator-*/task-tN
if tmux capture-pane shows shell prompt; then
任务执行器已完成,检查是否有未推送的工作
if git log @{u}.. --oneline | grep -q .; then
git push -u origin HEAD
gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main
fi
fi
undefined