best-workflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Temporary Files

临时文件

You can use the
tmp/
subfolder in the current project folder to save any temporary files if needed. This is useful for storing intermediate results, reports, or data during multi-step workflows.

如果需要,你可以使用当前项目文件夹中的
tmp/
子文件夹来保存任何临时文件。这在多步骤工作流中存储中间结果、报告或数据时非常有用。

Agents

Agents

110+ specialized AI agents. Agents are stored in
<skill-folder>/agents/
as Markdown files with YAML frontmatter.
Discovery: Do FULL read of
<skill-folder>/agents/INDEX.md
for the full categorized agent directory (110+ agents grouped by domain). Pick the MOST specialized agent — domain-specific checklists and anti-patterns only work when the agent matches the domain.
110+ 专业AI Agent。Agents以带YAML前置元数据的Markdown文件形式存储在
<skill-folder>/agents/
目录下。
发现机制: 完整读取
<skill-folder>/agents/INDEX.md
以获取完整的分类Agent目录(110+个Agent按领域分组)。选择最专业的Agent——特定领域的检查清单和反模式仅当Agent匹配对应领域时才生效。

Agent Categories

Agent分类

CategoryCountExamples
Language Implementation22python-pro, golang-pro, rust-pro, typescript-pro
Web Frameworks10react-pro, nextjs-pro, django-pro, fastapi-pro
Architecture & Design9backend-architect, api-designer, microservices-architect
DevOps & Infrastructure11devops-engineer, kubernetes-architect, cloud-architect
Security6security-reviewer, penetration-tester, threat-modeling-pro
Database5postgres-pro, sql-pro, database-architect
Testing & Quality5code-reviewer, tdd-guide, test-automator
AI & ML5ai-engineer, ml-engineer, prompt-engineer
Frontend & Mobile5frontend-developer, ios-pro, ui-designer
Documentation7documentation-pro, technical-writer, docs-architect
Incident & Troubleshooting4incident-responder, debugger, devops-troubleshooter
Specialized22build-engineer, cli-developer, product-manager, web-searcher, etc.
分类数量示例
语言实现22python-pro, golang-pro, rust-pro, typescript-pro
Web框架10react-pro, nextjs-pro, django-pro, fastapi-pro
架构与设计9backend-architect, api-designer, microservices-architect
DevOps与基础设施11devops-engineer, kubernetes-architect, cloud-architect
安全6security-reviewer, penetration-tester, threat-modeling-pro
数据库5postgres-pro, sql-pro, database-architect
测试与质量5code-reviewer, tdd-guide, test-automator
AI与机器学习5ai-engineer, ml-engineer, prompt-engineer
前端与移动5frontend-developer, ios-pro, ui-designer
文档7documentation-pro, technical-writer, docs-architect
事件与故障排查4incident-responder, debugger, devops-troubleshooter
专业领域22build-engineer, cli-developer, product-manager, web-searcher, etc.

Agent Selection

Agent选择

Most specialized wins (e.g., postgres-pro over database-optimizer). Split hybrid tasks into subtasks with different agents.

专业性优先(例如,优先选择postgres-pro而非database-optimizer)。将混合任务拆分为不同Agent处理的子任务。

Memory System

内存系统

NEVER use MEMORY.md for anything. MEMORY.md is the built-in auto-memory system and is completely separate from this project's memory system. Do not read, write, or reference MEMORY.md. Use only
knowledge.md
and
session.md
via the
memory.sh
tool.
Two-tier: Knowledge (
knowledge.md
) permanent, Session (
session.md
) temporary.
QuestionUse
Will this help in future sessions?Knowledge
Current task only?Session
Discovered a gotcha/pattern/config?Knowledge
Tracking todos/progress/blockers?Session
绝对不要将MEMORY.md用于任何用途。 MEMORY.md是内置的自动内存系统,与本项目的内存系统完全分离。不要读取、写入或引用MEMORY.md。仅通过
memory.sh
工具使用
knowledge.md
session.md
双层架构:Knowledge
knowledge.md
)永久存储,Session
session.md
)临时存储。
问题使用场景
这对未来会话有帮助吗?Knowledge
仅用于当前任务?Session
发现了陷阱/模式/配置?Knowledge
跟踪待办事项/进度/阻塞点?Session

Knowledge

Knowledge

bash
./<skill-folder>/tools/memory.sh add <category> "<content>" [--tags a,b,c]
CategorySave When
architecture
System design, service connections, ports
gotcha
Bugs, pitfalls, non-obvious behavior
pattern
Code conventions, recurring structures
config
Environment settings, credentials
entity
Important classes, functions, APIs
decision
Why choices were made
discovery
New findings about codebase
todo
Long-term tasks to remember
reference
Useful links, documentation
context
Background info, project context
Tags: Cross-cutting concerns (e.g.,
--tags redis,production,auth
). Skip: Trivial, easily grep-able, duplicates.
After tasks: State "Memories saved: [list]" or "Memories saved: None"
Other:
search "<query>"
,
list [--category CAT]
,
delete <id>
,
stats
bash
./<skill-folder>/tools/memory.sh add <category> "<content>" [--tags a,b,c]
分类保存时机
architecture
系统设计、服务连接、端口
gotcha
漏洞、陷阱、非明显行为
pattern
代码规范、重复结构
config
环境设置、凭证
entity
重要类、函数、API
decision
选择背后的原因
discovery
关于代码库的新发现
todo
需要记住的长期任务
reference
有用的链接、文档
context
背景信息、项目上下文
标签: 跨领域关注点(例如
--tags redis,production,auth
)。跳过: 琐碎内容、易于通过grep查找的内容、重复内容。
任务完成后: 声明“已保存记忆: [列表]”或“已保存记忆: 无”
其他命令:
search "<query>"
,
list [--category CAT]
,
delete <id>
,
stats

Session

Session

Tracks current task. Persists until cleared.
Categories:
plan
,
todo
,
progress
,
note
,
context
,
decision
,
blocker
. Statuses:
pending
in_progress
completed
|
blocked
.
bash
./<skill-folder>/tools/memory.sh session add todo "Task" --status pending
./<skill-folder>/tools/memory.sh session show                    # View current
./<skill-folder>/tools/memory.sh session update <id> --status completed
./<skill-folder>/tools/memory.sh session delete <id>
./<skill-folder>/tools/memory.sh session clear                   # Current only
./<skill-folder>/tools/memory.sh session clear --all             # ALL sessions
跟踪当前任务。持续到被清除为止。
分类:
plan
,
todo
,
progress
,
note
,
context
,
decision
,
blocker
状态:
pending
in_progress
completed
|
blocked
bash
./<skill-folder>/tools/memory.sh session add todo "Task" --status pending
./<skill-folder>/tools/memory.sh session show                    # 查看当前会话
./<skill-folder>/tools/memory.sh session update <id> --status completed
./<skill-folder>/tools/memory.sh session delete <id>
./<skill-folder>/tools/memory.sh session clear                   # 仅清除当前会话
./<skill-folder>/tools/memory.sh session clear --all             # 清除所有会话

Checkpoints

检查点

Checkpoints are session-context entries written after every workflow step. Full protocol — when to checkpoint, format, and compaction recovery sequence — is in Orchestration Workflow → Checkpoints & Recovery.
检查点是在每个工作流步骤后写入的会话上下文条目。完整的协议——何时创建检查点、格式以及压缩恢复序列——在Orchestration Workflow → Checkpoints & Recovery中。

Multi-Session

多会话

Multiple CLI instances work without conflicts. Resolution:
-S
flag >
MEMORY_SESSION
env >
<skill-folder>/current_session
file >
"default"
.
bash
./<skill-folder>/tools/memory.sh session use feature-auth        # Switch session
./<skill-folder>/tools/memory.sh -S other session add todo "..." # One-off
./<skill-folder>/tools/memory.sh session sessions                # List all

多个CLI实例可无冲突运行。优先级:
-S
标志 >
MEMORY_SESSION
环境变量 >
<skill-folder>/current_session
文件 >
"default"
bash
./<skill-folder>/tools/memory.sh session use feature-auth        # 切换会话
./<skill-folder>/tools/memory.sh -S other session add todo "..." # 单次切换
./<skill-folder>/tools/memory.sh session sessions                # 列出所有会话

Web Research

Web研究

For any internet search or web content retrieval:
  1. ALL internet research must go through
    web_search.sh
    — no exceptions. This means: no built-in websearch tool, no WebFetch tool, no
    curl
    against APIs, no manual GitHub API calls, no
    wget
    , nothing else. Every time you need information from the internet, use
    ./<skill-folder>/tools/web_search.sh "query"
    (or
    <skill-folder>/tools/web_search.bat
    on Windows)
    • One query per call — run each query as a separate
      web_search.sh
      invocation. Never combine multiple queries into a single call. Run calls sequentially (one after another, not in parallel) to avoid hitting API rate limits
    • Always use default options — never add
      -s
      ,
      --max-results
      , or any result-limiting flags. Let the tool use its built-in defaults
    • Scientific queries: add
      --sci
      for CS, physics, math, engineering (arXiv + OpenAlex)
    • Medical queries: add
      --med
      for medicine, clinical trials, biomedical (PubMed + Europe PMC + OpenAlex)
    • Tech queries: add
      --tech
      for software dev, DevOps, IT, startups (Hacker News + Stack Overflow + Dev.to + GitHub)
  2. Synthesize results into a report
Note: Always use forward slashes (
/
) in paths for agent tool run, even on Windows. Dependencies handled automatically via uv.

对于任何互联网搜索或Web内容检索:
  1. 所有互联网研究必须通过
    web_search.sh
    进行
    ——无例外。这意味着:不得使用内置Web搜索工具、WebFetch工具、针对API的
    curl
    、手动GitHub API调用、
    wget
    或任何其他方式。每次需要从互联网获取信息时,使用
    ./<skill-folder>/tools/web_search.sh "query"
    (Windows系统使用
    <skill-folder>/tools/web_search.bat
    • 一次调用一个查询——每个查询作为单独的
      web_search.sh
      调用运行。切勿将多个查询合并到单个调用中。顺序运行调用(一个接一个,而非并行)以避免触发API速率限制
    • 始终使用默认选项——切勿添加
      -s
      --max-results
      或任何限制结果的标志。让工具使用其内置默认值
    • **科学查询:添加
      --sci
      **适用于计算机科学、物理、数学、工程领域(arXiv + OpenAlex)
    • **医学查询:添加
      --med
      **适用于医学、临床试验、生物医学领域(PubMed + Europe PMC + OpenAlex)
    • **技术查询:添加
      --tech
      **适用于软件开发、DevOps、IT、初创企业领域(Hacker News + Stack Overflow + Dev.to + GitHub)
  2. 将结果合成为报告
注意:在Agent工具运行中,路径始终使用正斜杠(
/
),即使在Windows系统中也是如此。依赖项通过uv自动处理。

Orchestration Workflow

编排工作流

Dynamic orchestration where the lead delegates everything to specialized agents. The planner researches the project, classifies the task, and dynamically assembles a custom workflow from available bricks — selecting only the stages the task actually needs. The lead spawns agents according to the manifest, coordinates verification, and delivers results. Automatic by default.
The ONLY agent-delegation pipeline is
assemble-prompt.sh
spawn-glm.sh
wait-glm.sh
. The
Task
tool's
subagent_type
parameter is forbidden — see Rules → Task tool prohibition for the full statement.
动态编排,由主导者将所有工作委托给专业Agent。规划者研究项目、分类任务,并从可用模块中动态组装自定义工作流——仅选择任务实际需要的阶段。主导者根据清单生成Agent、协调验证并交付结果。默认自动执行。
唯一的Agent委托管道是
assemble-prompt.sh
spawn-glm.sh
wait-glm.sh
。禁止使用
Task
工具的
subagent_type
参数——请参阅Rules → Task tool prohibition获取完整说明。

Agent Loading Rules

Agent加载规则

Agents folder:
<skill-folder>/agents/
. Use agents for all non-trivial subtasks — code writing, analysis, design, debugging, testing, documentation.
Rules:
  • Before any subtask: select the best agent and read its
    .md
    file (always fresh re-read)
  • Load ONE agent at a time (Exception: Orchestration Workflow may read multiple for prompt building)
  • All agent delegation goes through
    spawn-glm.sh
    — see Rules → Task tool prohibition
  • Agent instructions are TEMPORARY — apply to current subtask only, discard after
Discovery: Glob
<skill-folder>/agents/*.md
to list, Grep by keyword. Prefer specialized over general agents.
How the lead uses agents: The lead selects agents by name from the INDEX, writes task files with KEY FILES and MUST ANSWER questions, and uses
assemble-prompt.sh
to inject the agent's
.md
into the spawned agent's prompt. The lead does NOT load agent
.md
content into its own working context and never applies agent instructions itself. The agent
.md
is read for agent selection (which specialist?), not for the lead to execute. Agent
.md
files reach agents exclusively through
assemble-prompt.sh
spawn-glm.sh
.
Agent文件夹:
<skill-folder>/agents/
。将所有非琐碎子任务——代码编写、分析、设计、调试、测试、文档——交给Agent处理。
规则:
  • 在任何子任务之前:选择最佳Agent并读取其
    .md
    文件(始终重新读取最新版本)
  • 一次加载一个Agent(例外:编排工作流可能为构建提示而读取多个Agent)
  • 所有Agent委托必须通过
    spawn-glm.sh
    进行——请参阅Rules → Task tool prohibition
  • Agent指令是临时的——仅适用于当前子任务,完成后丢弃
发现机制: 使用Glob匹配
<skill-folder>/agents/*.md
列出所有Agent,通过关键字Grep查找。优先选择专业Agent而非通用Agent。
主导者如何使用Agent: 主导者从INDEX中按名称选择Agent,编写包含KEY FILES和MUST ANSWER问题的任务文件,并使用
assemble-prompt.sh
将Agent的
.md
注入到生成的Agent提示中。主导者不会将Agent的
.md
内容加载到自己的工作上下文中,也不会自行执行Agent指令。读取Agent的
.md
是为了选择Agent(哪个专家?),而非让主导者执行。Agent的
.md
文件仅通过
assemble-prompt.sh
spawn-glm.sh
传递给Agent。

Request Workflow

请求工作流

  1. Continuation:
    ./<skill-folder>/tools/memory.sh search "GLM-CONTINUATION"
    — resume if exists
    • If found: Read
      tmp/glm-continuation.md
      , read prior synthesis, and continue from where the previous session left off. The plan is already finalized and partially executed — pick up at the next uncompleted stage.
    • If not found: Proceed to step 2.
  2. Re-read Verification and Iterative Convergence sections: Before spawning ANY stage agents, re-read the Verification section AND Iterative Convergence section in full. Verification defines the severity-routed pipeline (extraction → route findings by severity → synthesis). Iterative Convergence defines planner-decided repeat logic (NONE/ONCE/LOOP). Skipping these re-reads is the #1 cause of plans missing appropriate verification and convergence. MANDATORY.
    Do NOT read source files, skim the project, or try to understand scope before spawning. The planner is your research — spawn it immediately. Fill in the project path, spawn, and let the planner do everything else. Any attempt to "understand the codebase first" IS the research we forbid. Go directly to step 3.
  3. Planning phase (2 batches, 2 agents) — ALWAYS run, never skipped: a. Initial planner: Copy
    <skill-folder>/templates/planner-task-template.txt
    , fill in the project path (just the working directory — the planner researches the codebase itself), assemble with
    assemble-prompt.sh -a agentic-planner -t research -n s0-planner
    , spawn (no
    -m
    , uses default model). Researches the project, classifies the task on 5 axes (size, domains, ambiguity, severity, type), selects bricks from the palette, and produces a custom workflow manifest to
    tmp/glm-plan.md
    . b. Mandatory plan review (ALL plans): Create a review task targeting
    tmp/glm-plan.md
    with MUST ANSWER questions covering brick selection, severity classification, agent assignment, verification placement, convergence decisions, volume splitting, and dependency analysis. Include
    WRITABLE FILES: tmp/glm-plan.md
    in the task file. Assemble with
    assemble-prompt.sh -a agent-organizer -t review -n s0-organize
    , spawn (no
    -m
    , default model). The agent-organizer reviews the plan using its dual analytical framework:
    Workflow quality (native anti-patterns): Check for over-staffing, wrong agent assignments, redundant agents, vague delegations, ignored dependencies, and stale agent references. Its anti-patterns list is a ready-made plan review checklist.
    Structural validation (embedded rules in task): Verify every DISCOVER/REVIEW stage has a corresponding VERIFY. Verify IMPLEMENT stages have a corresponding REVIEW. Verify MEDIUM+ severity tasks have second opinions in ALL DISCOVER and REVIEW stages, including CONVERGE iterations. Verify FIX stages include post-fix REVIEW. Verify CONVERGE variant matches the codebase characteristics documented in the plan's research — if coverage >80% and clean boundaries but CONVERGE=ONCE, correct to NONE (ONCE requires interconnected modules, dense coupling, non-uniform code patterns, >15K LOC per domain, or HIGH+ severity). Verify no agent is reused across CONVERGE iterations (different iterations deploy genuinely different specialists). If the plan specifies an exclusion list, mechanically cross-check EVERY iter 2 agent against it — do NOT trust the plan's claim without verifying each slot. When the task spans 2+ domains: verify the Boundary Analysis section exists, each boundary is triaged (ALWAYS/DEFAULT/SKIP), ALWAYS/DEFAULT boundaries have intersection agents in DISCOVER and cross-domain reviewers in REVIEW, and SKIP boundaries have one-line justification. Verify cross-domain integration review only runs when genuinely different specialists are at integration boundaries. Verify domain breadth counts specialists, not packages. Volume splitting and close-call flagging are handled by the organizer's scope resolution step before structural validation — do NOT duplicate them here. Verify sequential stages are genuinely dependent — if stage N+1 does not consume stage N's verified output, flag for merge into a single parallel stage. Flag miscounts or over-large single-agent scopes.
    After review, the organizer applies all fixes directly to
    tmp/glm-plan.md
    . Its report documents what was changed and why. The organizer's output IS the final plan — no separate merge agent is needed. This runs on EVERY plan — a bad plan poisons everything downstream regardless of severity.
  4. Review final plan: Read
    tmp/glm-plan.md
    , confirm classification, brick selection, and stage structure are sound. Verify CONVERGE variant matches codebase characteristics from the planner's own Phase 1 research — if research shows >80% coverage and clean boundaries but CONVERGE=ONCE, flag for correction (ONCE is for interconnected modules, dense coupling, non-uniform code patterns, >15K LOC per domain, or HIGH+ severity, not a default). If gaps remain, spawn a quick-fix agent to correct the plan.
  5. Decompose: List subtasks from the plan, map each to best agent, report to user
CRITICAL — Plan Display Rule: After the planning phase completes and before spawning ANY stage agent, you MUST output the full stage plan as text to the user — see Workflow → Planning for the format. Writing the plan to
tmp/glm-plan.md
does NOT replace showing it. Display first, then proceed.
  1. 续接会话:
    ./<skill-folder>/tools/memory.sh search "GLM-CONTINUATION"
    ——如果存在则恢复
    • 如果找到: 读取
      tmp/glm-continuation.md
      和先前的合成结果,从上一个会话中断的地方继续。计划已最终确定并部分执行——从下一个未完成的阶段开始。
    • 如果未找到: 继续步骤2。
  2. 重新读取Verification和Iterative Convergence部分: 在生成任何阶段Agent之前,完整重新读取Verification部分和Iterative Convergence部分。Verification定义了按严重程度路由的管道(提取 → 按严重程度路由发现结果 → 合成)。Iterative Convergence定义了规划者决定的重复逻辑(NONE/ONCE/LOOP)。跳过这些重新读取是计划缺少适当验证和收敛的首要原因。强制执行。
    在生成Agent之前,请勿读取源文件、浏览项目或尝试理解范围。 规划者负责研究——立即生成它。填写项目路径、生成Agent,让规划者处理其他所有事情。任何“先理解代码库”的尝试都是我们禁止的研究行为。直接进入步骤3。
  3. 规划阶段(2批,2个Agent)——始终运行,不得跳过: a. 初始规划者: 复制
    <skill-folder>/templates/planner-task-template.txt
    ,填写项目路径(仅工作目录——规划者会自行研究代码库),使用
    assemble-prompt.sh -a agentic-planner -t research -n s0-planner
    组装提示,生成Agent(不使用
    -m
    ,使用默认模型)。研究项目、从5个维度(规模、领域、模糊性、严重程度、类型)分类任务、从组件库中选择模块,并将自定义工作流清单生成到
    tmp/glm-plan.md
    。 b. 强制计划审查(所有计划): 创建针对
    tmp/glm-plan.md
    的审查任务,包含涵盖模块选择、严重程度分类、Agent分配、验证位置、收敛决策、拆分容量和依赖分析的MUST ANSWER问题。在任务文件中包含
    WRITABLE FILES: tmp/glm-plan.md
    。使用
    assemble-prompt.sh -a agent-organizer -t review -n s0-organize
    组装提示,生成Agent(不使用
    -m
    ,默认模型)。Agent-organizer使用其双重分析框架审查计划:
    工作流质量(原生反模式): 检查人员冗余、错误Agent分配、重复Agent、模糊委托、忽略依赖项和过时Agent引用。其反模式列表是现成的计划审查清单。
    结构验证(任务中嵌入的规则): 验证每个DISCOVER/REVIEW阶段都有对应的VERIFY阶段。验证IMPLEMENT阶段有对应的REVIEW阶段。验证MEDIUM及以上严重程度的任务在所有DISCOVER和REVIEW阶段(包括CONVERGE迭代)都有第二意见。验证FIX阶段包含修复后的REVIEW。验证CONVERGE变体与计划研究中记录的代码库特征匹配——如果覆盖率>80%且边界清晰,但CONVERGE=ONCE,则更正为NONE(ONCE适用于互连模块、紧密耦合、非统一代码模式、每个领域>15K LOC或HIGH及以上严重程度)。验证没有Agent在CONVERGE迭代中重复使用(不同迭代部署真正不同的专家)。如果计划指定了排除列表,机械地交叉检查每个迭代2的Agent——不验证每个位置就不要相信计划的声明。当任务跨越2个及以上领域时:验证Boundary Analysis部分存在,每个边界都经过分类(ALWAYS/DEFAULT/SKIP),ALWAYS/DEFAULT边界在DISCOVER阶段有交叉Agent,在REVIEW阶段有跨领域审查者,SKIP边界有单行理由。验证仅当集成边界存在真正不同的专家时才运行跨领域集成审查。验证领域广度按专家数量计算,而非包数量。容量拆分和临界标记由组织者在结构验证之前的范围解析步骤处理——请勿在此重复。验证顺序阶段确实存在依赖关系——如果阶段N+1不消耗阶段N的已验证输出,则标记为合并为单个并行阶段。标记计数错误或单个Agent范围过大的情况。
    审查完成后,组织者直接对
    tmp/glm-plan.md
    应用所有修复。其报告记录了更改内容和原因。组织者的输出即为最终计划——无需单独的合并Agent。这会在每个计划上运行——糟糕的计划会影响下游所有环节,无论严重程度如何。
  4. 审查最终计划: 读取
    tmp/glm-plan.md
    ,确认分类、模块选择和阶段结构合理。验证CONVERGE变体与规划者第一阶段研究中的代码库特征匹配——如果研究显示覆盖率>80%且边界清晰,但CONVERGE=ONCE,则标记为更正(ONCE适用于互连模块、紧密耦合、非统一代码模式、每个领域>15K LOC或HIGH及以上严重程度,而非默认选项)。如果仍有差距,生成一个快速修复Agent来更正计划。
  5. 分解: 列出计划中的子任务,将每个子任务映射到最佳Agent,向用户报告
关键——计划显示规则: 规划阶段完成后,在生成任何阶段Agent之前,必须将完整的阶段计划以文本形式输出给用户——请参阅Workflow → Planning获取格式。将计划写入
tmp/glm-plan.md
并不替代显示计划。先显示,再继续。

Subtask Workflow

子任务工作流

The lead's role in each subtask:
  1. Select the best agent, read its
    .md
    , prepare the task file using the planner's KEY FILES and MUST ANSWER questions from the manifest
  2. Assemble the prompt via
    assemble-prompt.sh
    , spawn the agent via
    spawn-glm.sh
  3. Wait for completion, check operational status (was the report produced? no STALLED/EMPTY/MISSING?)
  4. Delegate ALL substantive verification to the verification pipeline — the lead never evaluates output quality, judges findings, or assesses results
  5. Save non-trivial discoveries to knowledge
  6. Discard agent instructions, move to next subtask
Mid-execution research: When something is unclear during workflow execution (scope ambiguity, technical approach, a specific question the plan didn't cover), the lead may spawn a single unplanned agent using the default model to research that question. The lead chooses the exact agent for the job (e.g.
debugger
,
research-analyst
), prepares a prompt with the specific question and MUST ANSWER directives, and spawns via
spawn-glm.sh
. Use the agent's report to clarify the next action. This is an ad-hoc clarifying agent — NOT a replacement for the planner pipeline, not a way to re-do planning, not a substitute for discovery stages. Limit to one agent per question. Do NOT use this to research things the lead could discover by reading source code — the lead does not read source code.
主导者在每个子任务中的角色:
  1. 选择最佳Agent,读取其
    .md
    文件,使用规划者清单中的KEY FILES和MUST ANSWER问题准备任务文件
  2. 通过
    assemble-prompt.sh
    组装提示,通过
    spawn-glm.sh
    生成Agent
  3. 等待完成,检查运行状态(是否生成了报告?是否没有STALLED/EMPTY/MISSING?)
  4. 将所有实质性验证委托给验证管道——主导者从不评估输出质量、判断发现结果或评估结果
  5. 将非琐碎发现保存到knowledge
  6. 丢弃Agent指令,进入下一个子任务
执行中的研究: 当工作流执行中出现不清楚的情况(范围模糊、技术方法、计划未涵盖的特定问题)时,主导者可以生成一个未计划的Agent(使用默认模型)来研究该问题。主导者选择适合该工作的精确Agent(例如
debugger
research-analyst
),准备包含特定问题和MUST ANSWER指令的提示,并通过
spawn-glm.sh
生成Agent。使用Agent的报告明确下一步行动。这是一个临时澄清Agent——不是规划管道的替代品,不是重新规划的方式,也不是发现阶段的替代。每个问题限制使用一个Agent。请勿使用此方式研究主导者可以通过读取源代码发现的内容——主导者不读取源代码。

When to Delegate

何时委托

Delegation is the default.
Why delegation produces better results: A specialist agent with a dedicated context window focused exclusively on one domain will find issues you would miss while context-switching between multiple concerns. For most non-trivial work, delegation maximizes correctness by giving each problem domain undivided analytical attention.
Delegate when ANY of these match:
  • Multiple distinct topics/domains/areas involved
  • Task requires synthesizing information from different sources
  • Involves any kind of audit, review, or comprehensive analysis
  • Combines research with any follow-up action
  • Task has natural subtask boundaries that could run in parallel
  • Independent parallelizable subtasks
  • Production checks, security audits, code reviews
When a task has multiple independent angles (multi-file refactor, audit + test review, etc.), spawn as many agents as the task naturally decomposes into — spawn only what the work requires — all in parallel within a SINGLE stage. Sequential stages are ONLY correct when the next stage actually consumes the previous stage's verified output. Default: fan out within a stage; sequence only when there's a real dependency. More coverage finds more issues — fan-out (parallel agents) and convergence iterations are both ways to add coverage.
委托是默认行为。
为什么委托能产生更好的结果: 专注于单一领域的专业Agent拥有专用上下文窗口,会发现你在多关注点之间切换时错过的问题。对于大多数非琐碎工作,委托通过让每个问题领域获得无分散的分析关注来最大化正确性。
当以下任何情况匹配时委托:
  • 涉及多个不同主题/领域/领域
  • 任务需要综合来自不同来源的信息
  • 涉及任何类型的审计、审查或全面分析
  • 将研究与任何后续行动结合
  • 任务有自然的子任务边界,可以并行运行
  • 独立的可并行子任务
  • 生产检查、安全审计、代码审查
当任务有多个独立角度(多文件重构、审计+测试审查等)时,生成与任务自然分解数量相同的Agent——仅生成工作所需的Agent——在单个阶段内并行运行。顺序阶段仅当下一阶段实际消耗前一阶段的已验证输出时才正确。默认:在阶段内并行扩展;仅当存在真正依赖关系时才顺序执行。 更广泛的覆盖范围会发现更多问题——并行Agent和收敛迭代都是增加覆盖范围的方式。

Lead Role

主导者角色

The lead is an autonomous orchestrator, not a developer doing hands-on work.
Does: delegate planning to the agentic-planner pipeline, review manifest, decompose, execute workflow stages from the manifest, write agent prompts, spawn agents, delegate verification according to manifest (adversarial verification: 1:1 for CRITICAL/HIGH, 1 per 5 for MEDIUM), spawn fix-agents and quick-fix agents, synthesize, deliver.
Does not: run the full test suite, do comprehensive audits unprompted, write, edit, or modify ANY project source code (even a single line), do any codebase research (reading source files, skimming files, tracing logic, discovering project structure), or design workflows from scratch (that's the planner's job). These are agent work.
Lead success metrics:
  • Success: Decomposable subtasks went to specialists. Your context stayed clean for coordination. Findings were verified.
  • Failure: You did any implementation work an agent should have done (writing, editing, or modifying code). You read raw domain data that would have been better isolated in a specialist's context. You produced analysis without verification.
Self-check rules (MANDATORY) — run before working on ANY subtask:
  • The lead NEVER writes, edits, or modifies any project source file. The Edit and Write tools are for task files, prompts, and synthesis reports in tmp/ only. Any code change — even a single-line fix, a config tweak, or a build script adjustment — must go through a spawned agent.
  • Heavy Read/Grep usage for verification coordination is expected and allowed (reading agent reports, building task files from synthesis output). For anything resembling planning or codebase research — never. Delegate to the planner pipeline immediately. Reading source files to understand the codebase is planner-agent work, not lead work.
  • If a specialized agent in
    <skill-folder>/agents/INDEX.md
    matches the subtask domain → SPAWN it. Don't reproduce its work yourself
  • If the subtask requires writing code, running test suites, or deep analysis across many files → that's agent work. Delegate it via
    spawn-glm.sh
    (see Rules → Task tool prohibition for the absolute rule)
Rule compliance — the lead NEVER:
  • Reclassifies or downgrades an agent's severity finding to avoid running a mandatory verification stage. The reviewer's filed severity is authoritative.
  • Substitutes judgment for a mechanical trigger. "When X, do Y" means exactly that — the lead does not override with "X is true but Y seems unnecessary."
  • Resolves ambiguity in workflow rules by choosing the interpretation that avoids work. When a term has multiple readings, the lead applies the reading that preserves verification and quality gates, not the one that saves agents.
Verification vs implementation boundary:
  • Verification (lead delegates): After stage agents complete, spawn the verification pipeline:
  1. Extraction agent (single, default model): Reads all reports from the stage, deduplicates findings (same file:line + same issue → merge, note source), classifies each finding by severity, splits into batches grouped by domain and severity. When the originating stage (DISCOVERY or REVIEW) used a second opinion agent, tag each finding as "both-found" (both agents reported independently) or "single-found" (one agent only). When intersection agents were present, also tag findings as "boundary-found" (reported by an intersection agent auditing a domain boundary — inherently invisible to within-domain specialists) or "domain-only" (reported only by domain primaries/second opinions). Both-found and boundary-found carry elevated confidence for different reasons: both-found signals cross-agent agreement within a domain; boundary-found signals issues spanning domains that no within-domain specialist could have detected. A finding that is both "both-found" AND "boundary-found" carries the highest confidence. Surface all tags in synthesis. Findings from documentation specialist agents (documentation-pro) are domain-verified — route them directly to synthesis at the agent's rated severity, skipping adversarial verification. If extraction finds 0 findings, VERIFY early-exits — nothing to verify, skip all subsequent batches.
    Mechanical trigger — MANDATORY: If extraction finds any finding at MEDIUM severity or above, the lead MUST spawn ALL verification batches the extraction report prescribes — every adversarial batch, at the exact finding IDs listed in the extraction's batch assignment table. Spawning an adversarial agent against different findings than prescribed does NOT satisfy this trigger. The lead does NOT pre-judge findings, skip verification steps, substitute finding targets, or decide which findings "don't matter." Only the synthesis grid determines FIX=SKIPPED. The synthesis agent is part of the pipeline — it MUST run after all routing agents complete, even if every routed finding was REJECTED or WEAKENED. The lead does NOT evaluate routing agent outputs to decide whether synthesis is needed. Proceeding to the next stage without completing all verification steps is a protocol violation.
  2. Findings routed by severity (single-source routing):
    • CRITICAL/HIGH findings → Adversarial agent (single agent per finding (1:1), default model; use
      adversarial-reviewer
      agent
      .md
      ). The adversarial agent tries to FALSIFY every finding in its batch: reads cited code with full surrounding context (minimum 30 lines), exhaustively searches for counter-evidence at every level (same function guards, caller-level validation, framework-level protections — middleware, decorators, interceptors, global error handlers, type system invariants, test coverage), and labels each finding with evidence:
      • CONFIRMED — exhaustive search found NO counter-evidence. Describe what patterns were searched, which grep commands were run, why nothing was found.
      • REJECTED — found CLEAR counter-evidence that disproves the claim. Paste exact code with file:line.
      • WEAKENED — partial counter-evidence reduces severity or scope but doesn't fully disprove. State the correct severity.
      The adversarial agent assumes the claimed issue is a misunderstanding and searches exhaustively before confirming. For "missing X" findings, searching for X and finding it in no reachable code path IS valid evidence — document all searched locations. Every CONFIRMED label must be hard-won — superficial grep is not exhaustive. Surviving findings become ADVERSARIALLY VERIFIED.
  • CRITICAL/HIGH findings from intersection or cross-domain integration review (any finding spanning domain boundaries, from DISCOVER or REVIEW) → Adversarial cross-domain agent (single agent per finding (1:1), default model). Same exhaustive falsification but verifies from BOTH sides of the integration boundary (Domain A producer + Domain B consumer + bridge between them). Finding only survives if no counter-evidence on either side or in the bridge.
    • MEDIUM findings → Adversarial agent (single agent per batch of 5 findings, default model; use
      adversarial-reviewer
      agent
      .md
      ). Same exhaustive falsification methodology as CRITICAL/HIGH findings — reads cited code with full surrounding context (minimum 30 lines), exhaustively searches for counter-evidence at every level (same function guards, caller-level validation, framework-level protections — middleware, decorators, interceptors, global error handlers, type system invariants, test coverage), and labels each CONFIRMED / REJECTED / WEAKENED with evidence. Default position: assume the claimed issue is a misunderstanding and search exhaustively before confirming. Every CONFIRMED label must be hard-won — superficial grep is not exhaustive. For "missing X" findings, searching for X and finding it in no reachable code path IS valid evidence — document all searched locations.
    • LOW findings → NOTED. Recorded in the report. No further agent spend.
  1. Synthesis agent (single, default model): Reads all adjudication verdicts. Builds a unified verification grid:
    CONFIRMEDREJECTEDWEAKENED
    → fix list→ droppedseverity downgraded → fix list at lower priority
    Surfaces "both-found" confidence signals from extraction — findings reported by both primary and second opinion agents carry higher initial confidence.
    If the synthesis grid shows zero CONFIRMED findings at MEDIUM or above (all MEDIUM+ findings were REJECTED, or only LOW-severity survivors remain), FIX is SKIPPED — there is nothing significant to fix. LOW verified findings are acknowledged in the synthesis as non-blocking. The lead writes the synthesis with
    FIX SKIPPED: Zero MEDIUM+ verified findings — nothing to fix.
    This is mechanical — no lead judgment.
Lead coordinates batches, never investigates findings manually, and writes the final synthesis from the synthesis agent's grid.
  • Implementation (agent does): Writing/editing code, running test suites, fixing bugs, adding tests, refactoring
  • After the verified checklist is produced, if many fixes are needed across many files: collect them into a fix-agent prompt and spawn
Quick-fix agents: For two specific scenarios — (1) agent output needs minor finishing, (2) reverting incorrect edits — spawn a single quick-fix agent using the default model. Lead chooses the exact agent for the job. No verification pipeline — this is a quick, informal fix. If the fix is wrong, escalate immediately to a full IMPLEMENT → REVIEW → VERIFY cycle for that component. No direct work — the lead never edits project code. Quick-fix agents are the only exception to "every review must be verified."
Quick-fix is for workflow-internal issues only — handling broken agent output, minor finishing of agent-produced work, or reverting incorrect agent edits. Quick-fix agents are NOT a substitute for running the full workflow. For any task, no matter how small, the planner pipeline must run first. Quick-fix operates inside an existing workflow — never as a standalone replacement for planning, review, or verification.
Workflow autonomy: The lead runs the workflow to completion without waiting for user approval. The planner agent designs the initial workflow (stages, agents, verification placement); the lead reviews, adapts, and refines it — adding or modifying non-PLAN stages as understanding deepens during execution. Each stage follows the prepare → spawn → verify cycle. A stage is complete ONLY when ALL its agents have produced their expected output. A stage with failed or missing agents is incomplete — diagnose failures, fix root causes, re-spawn. Proceeding to the next stage with an incomplete current stage — outside the narrow gap-acceptance rules in Execution step 4 — is a protocol violation. The lead has full authority to adapt non-PLAN parts of the plan mid-execution. PLAN stages (2-agent planning pipeline) cannot be removed. DISCOVER, IMPLEMENT, REVIEW, FIX, and TEST stages may be SKIPPED only when the planner's manifest explicitly marks them as NONE for the given task severity — never for speed or convenience. VERIFY is skipped when extraction finds 0 findings or when the lead may mark it as SKIPPED for non-code-level findings. Prior workflow runs do not excuse skipping — every code change requires fresh verification regardless of what previous sessions found.
主导者是自主编排者,而非从事实际工作的开发者。
负责: 将规划委托给agentic-planner管道、审查清单、分解任务、执行清单中的工作流阶段、编写Agent提示、生成Agent、根据清单委托验证(对抗性验证:CRITICAL/HIGH为1:1,MEDIUM为每5个1个)、生成修复Agent和快速修复Agent、合成结果、交付成果。
不负责: 运行完整测试套件、无提示进行全面审计、编写、编辑或修改任何项目源代码(即使是一行)、进行任何代码库研究(读取源文件、浏览文件、跟踪逻辑、发现项目结构)、从头设计工作流(这是规划者的工作)。这些都是Agent的工作。
主导者成功指标:
  • 成功: 可分解的子任务交给了专家。你的上下文保持清晰以进行协调。发现结果经过验证。
  • 失败: 你完成了Agent应该完成的任何实现工作(编写、编辑或修改代码)。你读取了最好隔离在专家上下文中的原始领域数据。你在未验证的情况下生成了分析结果。
自我检查规则(强制执行)——在处理任何子任务之前运行:
  • 主导者永远不会编写、编辑或修改任何项目源文件。Edit和Write工具仅用于tmp/中的任务文件、提示和合成报告。任何代码更改——即使是单行修复、配置调整或构建脚本修改——都必须通过生成的Agent进行。
  • 允许使用Heavy Read/Grep进行验证协调(读取Agent报告、从合成输出构建任务文件)。对于任何类似规划或代码库研究的行为——绝对禁止。立即委托给规划管道。读取源代码以理解代码库是规划Agent的工作,而非主导者的工作。
  • 如果
    <skill-folder>/agents/INDEX.md
    中的专业Agent匹配子任务领域 → 生成它。不要自己重复其工作
  • 如果子任务需要编写代码、运行测试套件或跨多个文件进行深度分析 → 这是Agent的工作。通过
    spawn-glm.sh
    委托(请参阅Rules → Task tool prohibition获取绝对规则)
规则合规——主导者永远不会:
  • 重新分类或降低Agent的严重程度发现结果以避免运行强制验证阶段。审查者记录的严重程度具有权威性。
  • 用判断替代机械触发条件。“当X时,执行Y”意味着完全按照此执行——主导者不得用“X为真但Y似乎不必要”来覆盖规则。
  • 通过选择避免工作的解释来解决工作流规则中的模糊性。当一个术语有多种解读时,主导者应应用保留验证和质量门的解读,而非节省Agent资源的解读。
验证与实现边界:
  • 验证(主导者委托):阶段Agent完成后,生成验证管道:
  1. 提取Agent(单个,默认模型):读取阶段的所有报告,去重发现结果(相同file:line + 相同问题 → 合并,注明来源),按严重程度分类每个发现结果,按领域和严重程度分组为批次。当源阶段(DISCOVERY或REVIEW)使用了第二意见Agent时,将每个发现结果标记为“both-found”(两个Agent独立报告)或“single-found”(仅一个Agent报告)。当存在交叉Agent时,还将发现结果标记为“boundary-found”(由审计领域边界的交叉Agent报告——领域内专家无法发现)或“domain-only”(仅由领域主Agent/第二意见报告)。both-found和boundary-found因不同原因具有更高的置信度:both-found表示领域内跨Agent一致;boundary-found表示跨领域的问题,任何领域内专家都无法检测到。同时是“both-found”和“boundary-found”的发现结果具有最高置信度。在合成中显示所有标签。文档专家Agent(documentation-pro)的发现结果已通过领域验证——直接按Agent评级的严重程度路由到合成,跳过对抗性验证。如果提取发现0个结果,VERIFY提前退出——无内容可验证,跳过所有后续批次。
    机械触发——强制执行: 如果提取发现任何MEDIUM及以上严重程度的结果,主导者必须生成提取报告规定的所有验证批次——每个对抗性批次,针对提取批次分配表中列出的确切发现ID。针对规定以外的发现结果生成对抗性Agent不满足此触发条件。主导者不得预先判断发现结果、跳过验证步骤、替换发现目标或决定哪些发现结果“不重要”。只有合成网格决定FIX=SKIPPED。合成Agent是管道的一部分——即使所有路由结果都被REJECTED或WEAKENED,也必须在所有路由Agent完成后运行。主导者不得评估路由Agent的输出以决定是否需要合成。未完成所有验证步骤就进入下一阶段是违反协议的行为。
  2. 按严重程度路由发现结果(单源路由):
    • CRITICAL/HIGH发现结果 → 对抗性Agent(每个发现结果一个Agent(1:1),默认模型;使用
      adversarial-reviewer
      Agent的
      .md
      )。对抗性Agent尝试证伪批次中的每个发现结果:读取引用代码及其完整上下文(至少30行),在各个层面全面搜索反证(相同函数保护、调用者级验证、框架级保护——中间件、装饰器、拦截器、全局错误处理、类型系统不变量、测试覆盖率),并为每个发现结果标记证据:
      • CONFIRMED ——全面搜索未发现反证。描述搜索的模式、运行的grep命令、未找到的原因。
      • REJECTED ——发现明确的反证,反驳了主张。粘贴带有file:line的精确代码。
      • WEAKENED ——部分反证降低了严重程度或范围,但未完全反驳。说明正确的严重程度。
      对抗性Agent假设主张的问题是误解,在确认前进行全面搜索。对于“缺少X”的发现结果,搜索X并在所有可达代码路径中未找到X是有效证据——记录所有搜索位置。每个CONFIRMED标签都必须经过努力获得——表面的grep不是全面搜索。幸存的发现结果成为ADVERSARIALLY VERIFIED。
  • 来自交叉或跨领域集成审查的CRITICAL/HIGH发现结果(任何跨领域边界的发现结果,来自DISCOVER或REVIEW) → 对抗性跨领域Agent(每个发现结果一个Agent(1:1),默认模型)。同样的全面证伪,但从集成边界的双方验证(领域A生产者 + 领域B消费者 + 两者之间的桥梁)。只有当双方或桥梁中都没有反证时,发现结果才会幸存。
    • MEDIUM发现结果 → 对抗性Agent(每5个发现结果一个Agent,默认模型;使用
      adversarial-reviewer
      Agent的
      .md
      )。与CRITICAL/HIGH发现结果相同的全面证伪方法——读取引用代码及其完整上下文(至少30行),在各个层面全面搜索反证(相同函数保护、调用者级验证、框架级保护——中间件、装饰器、拦截器、全局错误处理、类型系统不变量、测试覆盖率),并为每个CONFIRMED / REJECTED / WEAKENED标记证据。默认立场:假设主张的问题是误解,在确认前进行全面搜索。每个CONFIRMED标签都必须经过努力获得——表面的grep不是全面搜索。对于“缺少X”的发现结果,搜索X并在所有可达代码路径中未找到X是有效证据——记录所有搜索位置。
    • LOW发现结果 → NOTED。记录在报告中。不消耗更多Agent资源。
  1. 合成Agent(单个,默认模型):读取所有裁决结果。构建统一的验证网格:
    CONFIRMEDREJECTEDWEAKENED
    → 修复列表→ 丢弃严重程度降级 → 低优先级修复列表
    显示提取中的“both-found”置信度信号——由主Agent和第二意见Agent报告的发现结果具有更高的初始置信度。
    如果合成网格显示没有MEDIUM及以上的CONFIRMED发现结果(所有MEDIUM+发现结果都被REJECTED,或仅存LOW严重程度结果),则FIX SKIPPED——没有重要内容需要修复。LOW验证发现结果在合成中被确认为非阻塞。主导者编写合成结果时注明
    FIX SKIPPED: Zero MEDIUM+ verified findings — nothing to fix.
    这是机械操作——无需主导者判断。
主导者协调批次,从不手动调查发现结果,并根据合成Agent的网格编写最终合成结果。
  • 实现(Agent负责):编写/编辑代码、运行测试套件、修复漏洞、添加测试、重构
  • 生成已验证清单后,如果需要跨多个文件进行大量修复:将收集到的内容放入修复Agent提示并生成Agent
快速修复Agent: 针对两种特定场景——(1) Agent输出需要小幅度完善,(2) 回滚不正确的编辑——生成单个快速修复Agent(使用默认模型)。主导者选择适合该工作的精确Agent。无需验证管道——这是快速、非正式的修复。如果修复错误,立即升级为该组件的完整IMPLEMENT → REVIEW → VERIFY周期。不得直接工作——主导者永远不会编辑项目代码。快速修复Agent是“每次审查都必须验证”规则的唯一例外。
快速修复仅适用于工作流内部问题——处理损坏的Agent输出、Agent生成工作的小幅度完善或回滚不正确的Agent编辑。快速修复Agent不能替代完整工作流。对于任何任务,无论多小,都必须先运行规划管道。快速修复在现有工作流内操作——永远不能作为规划、审查或验证的独立替代品。
工作流自主性: 主导者将工作流运行完成,无需等待用户批准。规划Agent设计初始工作流(阶段、Agent、验证位置);主导者审查、调整和完善工作流——在执行过程中加深理解时添加或修改非PLAN阶段。每个阶段遵循准备 → 生成 → 验证周期。只有当所有Agent都生成了预期输出时,阶段才完成。Agent失败或缺失的阶段是不完整的——诊断失败原因、修复根本原因、重新生成Agent。除了执行步骤4中的窄间隙接受规则外,在当前阶段不完整的情况下进入下一阶段是违反协议的行为。主导者有权在执行过程中调整计划的非PLAN部分。PLAN阶段(2-Agent规划管道)不能移除。DISCOVER、IMPLEMENT、REVIEW、FIX和TEST阶段仅当规划者清单明确标记为针对给定任务严重程度的NONE时才可SKIPPED——永远不能为了速度或方便而跳过。当提取发现0个结果或主导者可以为非代码级发现结果标记SKIPPED时,VERIFY可跳过。先前的工作流运行不能成为跳过的理由——每次代码更改都需要通过完整工作流进行新的验证,无论先前会话发现了什么。

Tools

工具

Maximum 10 agents per parallel batch within a stage. A stage that has independent subtasks SHOULD use as many parallel agents as the task naturally decomposes into — spawn only what the work requires. Scale to scope: over-engineering with unnecessary agents degrades quality. When a stage genuinely needs more than 10 independent subtasks, split into sequential sub-batches within the stage. Single-agent stages are normal for tightly-scoped work and are only "the exception" when a task genuinely splits into more subtasks. Each agent is an independent unit; a stage is a parallel-batch boundary that may contain multiple agents. Implementation stages: a single agent writes code directly to original files, followed by a single review agent that reviews the result (see Agent Spawning). For multi-domain changes, one agent per domain writes in parallel.
Spawn:
bash
<skill-folder>/tools/spawn-glm.sh -n NAME -f PROMPT_FILE [-m MODEL] [--pi]
-m
is optional — when omitted, the agent uses default model. Use
-m MODEL
to override with a specific model. Use
--pi
if running inside pi harness (sub-agents should use same harness). Returns
SPAWNED|name|pid|log_file
. Backgrounds immediately. Report:
tmp/{NAME}-report.md
, log:
tmp/{NAME}-log.txt
(for pi harness check pi's session logs in ~/.pi/agent/sessions/ instead). Also writes to
tmp/{NAME}-status.txt
(reliable on Windows — stdout can be lost when parallel
.cmd
processes launch).
Stage types and model usage — all agents use the default model unless overridden with
-m
. The
-m
flag is available for any stage type when a specific model is needed.
Stage TypeDescription
Plan (always runs)Planner researches and produces the plan. Organizer (agent-organizer) reviews the plan, applies fixes, produces final plan. All use default model.
Discovery (review, research, audit, analysis)Specialist agent with dedicated context focused on one domain. When a stage has independent subtasks (different files, modules, concerns), spawn one agent per subtask — as many as the task naturally decomposes into, maximum 10 in parallel. At MEDIUM+ severity: second opinion agent runs in parallel with complementary specialist
.md
.
Implementation (write code)Single agent writes code directly to original files. For multi-domain changes, one agent per domain writes to respective files in parallel.
Review (after implementation or fix)Reviews implementation or fix for bugs, quality, correctness. Every implementation and every fix MUST be followed by a review agent. At MEDIUM+ severity: second opinion agent runs in parallel with language specialist
.md
.
Fixing (fix verified findings)Applies known fixes mechanically. Fix ALL confirmed findings from the synthesis grid. Every fix MUST be followed by a post-fix review agent.
Adversarial verification (falsification)For CRITICAL/HIGH findings — 1 agent per finding (1:1). For MEDIUM findings — 1 agent per batch of 5 findings. Both use exhaustive falsification: read cited code, search for counter-evidence at every level (same function, caller, framework, type system, tests). Label CONFIRMED / REJECTED / WEAKENED with evidence. Extraction and synthesis agents also default model.
Test (build + test suite)Runs build and test commands, fixes compilation/test failures, reports results.
Quick-fix (minor finishing, reverts)Short, informal fix for workflow-internal issues — fixing broken agent output or reverting incorrect edits. Not a substitute for the planning pipeline. No verification. If wrong, escalate to full IMPLEMENT → REVIEW → VERIFY.
Wait:
bash
<skill-folder>/tools/wait-glm.sh name1:$PID1 name2:$PID2 name3:$PID3
Blocks until all finish (Bash timeout: 600000). Do NOT use bare
wait
or
sleep
+ poll loops. Prefer
name:pid
format — enables progress monitoring (first at 30s, then every 60s) and STALLED detection (0-byte log after 2min). Bare PIDs still work but skip log monitoring. If Bash times out before agents finish, re-invoke with same arguments — this is normal for long-running agents.
每个阶段的并行批次最多10个Agent。 具有独立子任务的阶段应使用与任务自然分解数量相同的并行Agent——仅生成工作所需的Agent。根据范围调整:不必要的Agent过度设计会降低质量。当阶段确实需要超过10个独立子任务时,在阶段内拆分为顺序子批次。单Agent阶段对于范围狭窄的工作是正常的,只有当任务真正拆分为更多子任务时才是“例外”。每个Agent是独立单元;阶段是并行批次边界,可能包含多个Agent。实现阶段:单个Agent直接将代码写入原始文件,随后是单个审查Agent审查结果(请参阅Agent Spawning)。对于跨领域更改,每个领域一个Agent并行编写。
生成Agent:
bash
<skill-folder>/tools/spawn-glm.sh -n NAME -f PROMPT_FILE [-m MODEL] [--pi]
-m
是可选的——省略时,Agent使用默认模型。使用
-m MODEL
覆盖为特定模型。如果在pi harness中运行,使用
--pi
(子Agent应使用相同的harness)。返回
SPAWNED|name|pid|log_file
。立即后台运行。报告:
tmp/{NAME}-report.md
,日志:
tmp/{NAME}-log.txt
(对于pi harness,检查~/.pi/agent/sessions/中的pi会话日志)。还写入
tmp/{NAME}-status.txt
(在Windows上可靠——并行
.cmd
进程启动时stdout可能丢失)。
阶段类型和模型使用——所有Agent使用默认模型,除非用
-m
覆盖。当需要特定模型时,任何阶段类型都可使用
-m
标志。
阶段类型描述
Plan(始终运行)规划者研究并生成计划。Organizer(agent-organizer)审查计划、应用修复、生成最终计划。均使用默认模型。
Discovery(审查、研究、审计、分析)专注于单一领域的专业Agent。当阶段有独立子任务(不同文件、模块、关注点)时,每个子任务生成一个Agent——与任务自然分解数量相同,并行最多10个。MEDIUM及以上严重程度:第二意见Agent与互补专家
.md
并行运行。
Implementation(编写代码)单个Agent直接将代码写入原始文件。对于跨领域更改,每个领域一个Agent并行写入各自的文件。
Review(实现或修复后)审查实现或修复的漏洞、质量、正确性。每个实现和每个修复都必须跟随一个审查Agent。MEDIUM及以上严重程度:第二意见Agent与语言专家
.md
并行运行。
Fixing(修复已验证发现结果)机械应用已知修复。修复合成网格中的所有已确认发现结果。每个修复都必须跟随一个修复后审查Agent。
Adversarial verification(证伪)对于CRITICAL/HIGH发现结果——每个发现结果一个Agent(1:1)。对于MEDIUM发现结果——每5个发现结果一个Agent。均使用全面证伪:读取引用代码、在各个层面搜索反证(相同函数、调用者、框架、类型系统、测试)。用证据标记CONFIRMED / REJECTED / WEAKENED。提取和合成Agent也使用默认模型。
Test(构建+测试套件)运行构建和测试命令、修复编译/测试失败、报告结果。
Quick-fix(小幅度完善、回滚)针对工作流内部问题的简短、非正式修复——修复损坏的Agent输出或回滚不正确的编辑。不能替代规划管道。无需验证。如果错误,升级为完整IMPLEMENT → REVIEW → VERIFY。
等待Agent完成:
bash
<skill-folder>/tools/wait-glm.sh name1:$PID1 name2:$PID2 name3:$PID3
阻塞直到所有Agent完成(Bash超时:600000)。请勿使用裸
wait
sleep
+ 轮询循环。首选
name:pid
格式——支持进度监控(30秒首次,随后每60秒)和STALLED检测(2分钟后日志为0字节)。裸PID仍然有效,但跳过日志监控。如果Bash在Agent完成前超时,使用相同参数重新调用——这对于长时间运行的Agent是正常的。

Workflow

工作流

The planner designs the initial workflow, the lead reviews and adapts it. Typical flow: delegate to planner → review plan → for each stage in the manifest: prepare → spawn → wait → verify (severity-routed pipeline) → between stages → next stage. Stages may be iterative (see Iterative Convergence). The lead refines the plan and decides stage adjustments mid-execution.
规划者设计初始工作流,主导者审查并调整。典型流程:委托给规划者 → 审查计划 → 对于清单中的每个阶段:准备 → 生成 → 等待 → 验证(按严重程度路由的管道) → 阶段间处理 → 下一阶段。阶段可能是迭代的(请参阅Iterative Convergence)。 主导者在执行过程中完善计划并决定阶段调整。

Planning

规划

MANDATORY: Planner first, always. The planning pipeline runs in full before any workflow begins. The lead does NOT research the codebase — the planner agent researches and produces the plan.
Plan Display Rule: After the planning phase completes and before spawning ANY stage agent, you MUST output the full stage plan as text to the user. Writing to
tmp/glm-plan.md
does NOT replace showing it. Display first, then proceed.
The lead's role in preparation: 0. If the user's request is vague, ask clarifying questions to narrow scope — but do NO codebase research. Clarifying the user's intent (what they want) is fine; reading source files (how to do it) is the planner's job.
  1. Pass the user's request as-is and the current working directory to the planner — no summarization or research, the planner reads the codebase itself
  2. Review the planner-generated manifest for classification accuracy, brick selection, severity justification, and agent assignments
  3. If the manifest has discovered scope ambiguity, add discovery/research stages — these are agent work, not lead work. Never open source files to fill gaps yourself
  4. Write well-scoped prompts using the manifest's context, KEY FILES, and MUST ANSWER questions (provided by the planner per stage). The lead may add 1-2 supplementary questions about workflow concerns (e.g., "Was the linter run?") but does not write code-level technical questions.
  5. If the plan is insufficiently informed, re-run the planner with more specific questions or add a discovery stage. Under no circumstances does the lead read source files to research gaps directly
Spawning research agents (even iteratively to convergence) is encouraged when scope is unclear — thorough research almost always produces better results in later stages. Decompose into stages. ALWAYS output the full plan to the user before spawning any agents:
undefined
强制执行:始终先运行规划者。 规划管道在任何工作流开始前完整运行。主导者不研究代码库——规划Agent研究并生成计划。
计划显示规则: 规划阶段完成后,在生成任何阶段Agent之前,必须将完整的阶段计划以文本形式输出给用户。写入
tmp/glm-plan.md
并不替代显示计划。先显示,再继续。
主导者在准备中的角色: 0. 如果用户请求模糊,询问澄清问题以缩小范围——但不进行任何代码库研究。澄清用户意图(他们想要什么)是可以的;读取源文件(如何做)是规划者的工作。
  1. 将用户请求原样和当前工作目录传递给规划者——不总结或研究,规划者自行读取代码库
  2. 审查规划者生成的清单,确认分类准确性、模块选择、严重程度理由和Agent分配
  3. 如果清单发现范围模糊,添加发现/研究阶段——这些是Agent的工作,而非主导者的工作。永远不要打开源文件自己填补空白
  4. 使用清单的上下文、KEY FILES和MUST ANSWER问题(规划者为每个阶段提供)编写范围明确的提示。主导者可以添加1-2个关于工作流关注点的补充问题(例如“是否运行了linter?”),但不编写代码级技术问题。
  5. 如果计划信息不足,使用更具体的问题重新运行规划者或添加发现阶段。在任何情况下,主导者都不得读取源文件直接研究空白
生成研究Agent(即使迭代到收敛)在范围不明确时是鼓励的——彻底的研究几乎总是在后期阶段产生更好的结果。分解为阶段。在生成任何Agent之前,始终向用户输出完整计划:
undefined

DYNAMIC BRICK MANIFEST — planner selects bricks per task.

动态组件清单 —— 规划者根据任务选择组件。

No fixed skeleton. Each task gets a custom workflow.

无固定框架。每个任务都有自定义工作流。

Plan: [N stages, M total agents]
Stage 0: Plan — 2 agents (planner + organizer) Classification: size=[], domains=[], ambiguity=[], severity=[], type=[]
Stage 1: [Brick name] — [Variant] — N agents Justification: [why this brick, why this variant] Agent: [specialist name] Second Opinion: [agent name if MEDIUM+; "N/A (severity < MEDIUM)" otherwise] KEY FILES: [list] MUST ANSWER: 1. [technical question from planner's codebase research] 2. [...] ...
Total agents: M

The planner selects from the following bricks. Skipped bricks are noted as `SKIPPED: [reason]`. **Do NOT wait for user approval — output the plan and proceed immediately.**
计划: [N个阶段,M个总Agent]
阶段0: Plan —— 2个Agent(规划者 + 组织者) 分类: size=[], domains=[], ambiguity=[], severity=[], type=[]
阶段1: [组件名称] —— [变体] —— N个Agent 理由: [为什么选择此组件,为什么选择此变体] Agent: [专家名称] 第二意见: [如果MEDIUM+则为Agent名称;否则为"N/A (severity < MEDIUM)"] KEY FILES: [列表] MUST ANSWER: 1. [规划者代码库研究中的技术问题] 2. [...] ...
总Agent数: M

规划者从以下组件中选择。跳过的组件标记为`SKIPPED: [理由]`。**不要等待用户批准——输出计划并立即继续。**
Brick Catalog
组件目录
The planner assembles a custom workflow by selecting from these bricks. Each has variants. Not all bricks are needed for every task.
PLAN            Always FULL (2 agents: planner + organizer, both default model).
                No variants. Never skipped. Bad plan poisons everything downstream.
                Planner (agentic-planner) researches and produces the plan. Organizer (agent-organizer) reviews and fixes in-place — the organizer's output IS the final plan.

DISCOVER        Pre-change analysis — review/audit existing code before making changes.
├── NONE        Required for size=tiny — nothing to discover on changes this small.
│               Required for size=small when the planner traced the complete code
│               path and identified the exact fix location with file:line citations
│               — no open questions remain. Justify with specific research findings.
│               If the planner cannot state "Root cause at [file:line], fix is
│               [approach]" with concrete evidence, the NONE bar is not met.
├── SINGLE      1 agent per domain. Use for medium+ tasks, or small tasks
│               where open questions remain after planning research.
│               At MEDIUM+ severity: +1 second opinion agent per domain (parallel).
│               Default pair: domain specialist (primary) + code-reviewer (second opinion) — planner may override based on task context.
└── MULTI       N agents, one per domain. Split by specialist → volume.
                At MEDIUM+: each domain gets a second opinion agent.

                When the task spans 2+ domains with non-trivial coupling (see
                Boundary Selection Criteria below), the planner adds intersection
                discovery agents to the DISCOVER batch. An intersection agent
                audits the integration boundary between two adjacent domains —
                tracing the full data/error/call flow across the divide,
                verifying contracts hold at the boundary, and identifying
                mismatches in data format, error semantics, or transactional
                consistency. This is distinct from second opinions: second
                opinions apply a different analytical lens to the SAME domain;
                intersection agents trace the boundary BETWEEN different domains
                where coupling creates defect-prone blind spots invisible to
                either domain specialist alone. Intersection findings are tagged
                "boundary-found" in extraction — signaling issues no within-domain
                specialist could have detected. CRITICAL/HIGH findings from
                intersection discovery are routed through cross-domain adversarial
                verification (1:1 per finding, verifying from both sides of the
                boundary). Intersection agents MUST be placed in the first DISCOVER
                stage — never deferred to CONVERGE iterations. CONVERGE inherits
                the intersection requirement but adds ADDITIONAL agents with
                different specialists, not replacements for the first-stage ones.
                Intersection agents run in parallel with domain primaries and
                second opinions within the same stage. At MEDIUM+ severity: each
                intersection agent gets its own second opinion (a different
                specialist from the INDEX, not the same type as the intersection
                agent). Intersection agents audit gaps between domains — second
                opinions audit the intersection audit itself for missed concerns.

                The planner selects the best agent for each boundary based on
                domain context. Suggested defaults (planner's selection is
                authoritative — these are starting points, not mandates):
                `backend-architect` for data flow and contract tracing;
                `security-reviewer` for crypto/auth boundaries. The planner
                may choose any agent from the INDEX that fits the boundary.

IMPLEMENT       Write or modify code.
├── NONE        No code change (analysis-only, cosmetic-only).
├── SINGLE      1 agent per domain. Writes code directly to original files.
│               Standard for all code changes.
└── MULTI       N agents, one per domain. Split by specialist → volume.

                SINGLE for narrow single-domain changes; MULTI for changes
                spanning multiple specialists. Line count is not the measure —
                split by domain diversity, not file count.

REVIEW          Review code changes.
├── NONE        Skip: change type=cosmetic AND severity=none.
│               Or: IMPLEMENT=NONE.
├── SINGLE      1 agent per domain. Standard.
│               At MEDIUM+ severity: +1 second opinion agent per domain (parallel).
│               Default pair: code-reviewer (primary) + language specialist (second opinion) — planner may override based on task context.
│               When the task spans 2+ domains using DIFFERENT specialists,
│               the planner adds cross-domain integration reviewers to the
│               REVIEW batch (see Boundary Selection Criteria for triage —
│               same ALWAYS/DEFAULT/SKIP tiers apply). These agents focus
│               ONLY on integration points: API contracts, shared types,
│               data flow between domains, and regressions at boundaries from
│               implementation changes. Do NOT re-review domain-internal logic.
│               Post-implementation intersection review is critical: domain
│               reviewers see new methods as correct within their context;
│               only tracing the full boundary reveals regressions where error
│               contracts, data formats, or transactional ordering differ from
│               what the caller expects. Findings from cross-domain integration
│               review are routed through adversarial cross-verification (1:1
│               per CRITICAL/HIGH finding, verifying from both sides).
└── MULTI       N agents, one per domain.

VERIFY          Verify findings from DISCOVER, REVIEW, or post-fix review.
                Always includes extraction (1 agent, default model). Tags findings
                "both-found"/"single-found" when originating stage had second opinion,
                and "boundary-found"/"domain-only" when intersection agents were present.
                Routes findings by severity:
                
                CRITICAL/HIGH → ADVERSARIAL AGENT (1 agent per finding — 1:1)
                  Adversarial agent tries to FALSIFY every finding: reads cited code
                  with full surrounding context (minimum 30 lines), exhaustively
                  searches for counter-evidence at every level (same function guards,
                  caller-level validation, framework-level protections — middleware,
                  decorators, interceptors, global error handlers — type system
                  invariants, test coverage). Labels each CONFIRMED / REJECTED /
                  WEAKENED with evidence. For CONFIRMED: describe what patterns
                  were searched, which grep commands were run, why nothing was found.
                  For REJECTED: paste exact counter-evidence code with file:line.
                  For WEAKENED: paste partial counter-evidence AND explain what
                  portion of the original claim still stands.
                  Default position: assume the claimed issue is a misunderstanding and search exhaustively before confirming. For "missing X" findings, searching for X and finding it in no reachable code path IS valid evidence — document all searched locations. Findings that survive
                  exhaustive falsification become ADVERSARIALLY VERIFIED.
                
                CRITICAL/HIGH from intersection or cross-domain integration review
                  (any finding spanning domain boundaries, regardless of whether
                  it originated in DISCOVER or REVIEW) → ADVERSARIAL CROSS AGENT
                  (1 agent per finding — 1:1). Same exhaustive falsification but verifies
                  from BOTH sides of the integration boundary (Domain A producer +
                  Domain B consumer + bridge between them). Finding only survives
                  if no counter-evidence on either side or in the bridge.
                
                MEDIUM → ADVERSARIAL AGENT (1 agent per batch of 5 findings)
                  Same exhaustive falsification methodology as CRITICAL/HIGH —
                  reads cited code with full surrounding context (minimum 30
                  lines), exhaustively searches for counter-evidence at every
                  level (same function guards, caller-level validation,
                  framework-level protections, type system invariants, test
                  coverage). Labels each CONFIRMED / REJECTED / WEAKENED with
                  evidence. Default position: assume the claimed issue is a
                  misunderstanding and search exhaustively before confirming.
                  Every CONFIRMED label must be hard-won with grep evidence.
                
                LOW → NOTED. Recorded in report. No further agent spend.
                
                After routing: SYNTHESIS (1 agent, default model) compiles all
                verdicts into unified grid. Surfaces "both-found" confidence signals.
                Unified vocabulary: CONFIRMED / REJECTED / WEAKENED.
                Also sanity-checks severity assignments against the severity
                classification criteria — if a finding's severity appears mismatched
                (e.g., "SQL injection" labeled MEDIUM), flag it as CHALLENGED.
                Challenged findings are re-routed through adversarial verification.
                Exception: documentation-domain challenged findings skip
                adversarial — documentation severity is inherently subjective
                (is "10 missing API docs" HIGH or MEDIUM?) and adversarial
                review of severity ratings adds no meaningful verification.
                Documentation-domain challenged findings stay at their
                challenged severity; the lead accepts the downgrade directly.
                Early-exit: 0 findings after extraction → skip synthesis.
                Always runs when DISCOVER, REVIEW, or post-fix review produced findings with code-level references.
                When CONFIRMED findings exist at MEDIUM+, FIX=DOMAINS must follow.

CONVERGE        Repeat DISCOVER or REVIEW for additional passes. Planner decides variant.
                Factors: ambiguity, codebase complexity, finding volume, production impact,
                change type, time sensitivity.
                NONE: One pass. For well-understood, narrow work. Also appropriate
                      for codebases with comprehensive test coverage (>80%) and
                      clean module boundaries — first pass is unlikely to miss
                      meaningful issues.
                ONCE: One extra iteration if first pass found anything ("found
                      anything" means any iter 1 agent reported at least one
                      finding — regardless of whether it survived adversarial
                      verification; the point is different iter 2 specialists
                      re-examine what iter 1 noticed). Use when
                      the planner's Phase 1 research reveals interconnected modules,
                      dense coupling, non-uniform code patterns, or >15K LOC per
                      domain — characteristics suggesting a first pass may miss
                      issues. Also used when severity is HIGH/CRITICAL regardless
                      of codebase quality (missed findings are expensive). ONCE is
                      NOT the universal default — well-tested, cleanly-structured
                      codebases should use NONE.
                LOOP: Up to 3 iterations, stop on empty report. For highly ambiguous
                      or production-critical work where missed findings would be
                      unacceptable.
                Iterations inherit ALL mandatory rules from the parent stage type
                (second opinions at MEDIUM+, intersection agents at triaged boundaries,
                DISCOVER/REVIEW → VERIFY pipeline, etc.). Intersection agents inherited
                by CONVERGE are ADDITIONAL agents, not replacements — the first DISCOVER
                stage must have its own intersection agents for ALWAYS/DEFAULT boundaries;
                CONVERGE iter 2 adds fresh intersection agents with different specialists.
                
                Each iteration gets its own VERIFY stage. Iter 1's VERIFY runs BEFORE
                iter 2 spawns — the synthesis grid from iter 1's VERIFY determines
                whether iter 2 spawns (any finding = spawn) AND provides PRIOR CONTEXT
                for iter 2 agents. Do NOT merge both iterations' verification into a
                single stage after both iterations complete. The plan structure must be:
                  Stage N:   DISCOVER iter 1
                  Stage N+1: VERIFY iter 1
                  Stage N+2: DISCOVER iter 2 (conditional, PRIOR CONTEXT from N+1)
                  Stage N+3: VERIFY iter 2
                
                The planner must list all agents per iteration with different
                specialists from the previous iteration — the lead spawns
                whatever the plan lists. Before writing iter 2, the planner MUST
                list every agent `.md` file used in iter 1 and exclude them all
                from iter 2 — no agent may appear in any role in both iterations.
                Swapping primary and second opinion roles between iterations does
                NOT count as different specialists. Using the same pair of agent
                `.md` files in opposite roles is still the same analytical
                framework. The exclusion list must be explicit in the plan.

FIX             Apply verified findings. Always 2-3 sequential stages — includes post-fix review.
                When DOMAINS: 1 fix agent per domain → post-fix REVIEW
                (same variant/domain split as the REVIEW stage), then VERIFY
                if any post-fix review report contains at least one finding at
                MEDIUM severity or above. A finding is any numbered item with a
                severity label and code reference (file:line, function, or block)
                in a reviewer's report. The lead does NOT re-classify, downgrade,
                or exclude findings — the reviewer's filed severity is authoritative.
                VERIFY is skipped ONLY when ALL post-fix review reports contain
                zero MEDIUM+ findings. Mechanical trigger, no judgment.

                CONVERGENCE: If post-fix VERIFY produces CONFIRMED MEDIUM+
                findings in the synthesis grid, the fix is incomplete. Spawn a new
                fix pass (fix agents → post-fix review → conditional verify) for
                the confirmed findings. This repeats until post-fix review
                produces zero MEDIUM+ findings and VERIFY is skipped. The FIX
                brick is a convergence loop — one pass is never final when
                MEDIUM+ findings survive verification. Documented findings marked
                "for follow-up action" are still unfixed MEDIUM+ findings — fix
                them now, not later.
├── NONE        No verified findings.
└── DOMAINS     1 fix agent per domain → post-fix REVIEW → conditional VERIFY.

TEST            Run build + test suite. Always single agent, default model (mechanical).
├── NONE        IMPLEMENT=NONE. Or planner skips with justification (no test infra).
└── FULL        1 agent. Runs build + tests, fixes failures.
规划者通过选择这些组件组装自定义工作流。每个组件有变体。并非所有组件都适用于每个任务。
PLAN            始终完整运行(2个Agent:规划者 + 组织者,均使用默认模型)。
                无变体。永远不能跳过。糟糕的计划会影响下游所有环节。
                规划者(agentic-planner)研究并生成计划。组织者(agent-organizer)审查并就地修复——组织者的输出即为最终计划。

DISCOVER        变更前分析——在进行变更前审查/审计现有代码。
├── NONE        当size=tiny时必需——如此小的变更无需发现。
│               当size=small且规划者追踪了完整代码路径并确定了确切的修复位置(带file:line引用)时必需——无未解决问题。用具体研究结果证明。
│               如果规划者不能用具体证据说明“根本原因在[file:line],修复方法是[方案]”,则不满足NONE条件。
├── SINGLE      每个领域1个Agent。用于中等及以上任务,或规划研究后仍有未解决问题的小型任务。
│               MEDIUM及以上严重程度:每个领域+1个第二意见Agent(并行)。
│               默认组合:领域专家(主) + code-reviewer(第二意见)——规划者可根据任务上下文覆盖。
└── MULTI       N个Agent,每个领域一个。按专家 → 容量拆分。
                MEDIUM及以上:每个领域有一个第二意见Agent。

                当任务跨越2个及以上具有非琐碎耦合的领域(请参阅下文的Boundary Selection Criteria)时,规划者在DISCOVER批次中添加交叉发现Agent。交叉Agent审计两个相邻领域之间的集成边界——追踪跨边界的完整数据/错误/调用流,验证边界处的契约是否成立,并识别数据格式、错误语义或事务一致性的不匹配。这与第二意见不同:第二意见对**同一**领域应用不同的分析视角;交叉Agent追踪**不同**领域之间的边界,耦合在此处创建了领域专家单独无法发现的易缺陷盲点。交叉发现结果在提取中标记为“boundary-found”——表示领域专家无法检测到的问题。交叉发现的CRITICAL/HIGH结果通过跨领域对抗性验证路由(每个发现结果1:1,从边界两侧验证)。交叉Agent必须放在第一个DISCOVER阶段——永远不要推迟到CONVERGE迭代。CONVERGE继承交叉要求,但添加**额外**的不同专家Agent,而非替换第一阶段的Agent。
                交叉Agent与领域主Agent和第二意见在同一阶段并行运行。MEDIUM及以上严重程度:每个交叉Agent有自己的第二意见(来自INDEX的不同专家,而非与交叉Agent相同类型)。交叉Agent审计领域之间的差距——第二意见审计交叉审计本身是否遗漏了关注点。

                规划者根据领域上下文为每个边界选择最佳Agent。建议默认值(规划者的选择具有权威性——这些是起点,而非强制要求):
                `backend-architect`用于数据流和契约追踪;
                `security-reviewer`用于加密/认证边界。规划者可以选择INDEX中适合边界的任何Agent。

IMPLEMENT       编写或修改代码。
├── NONE        无代码变更(仅分析、仅 cosmetic)。
├── SINGLE      每个领域1个Agent。直接将代码写入原始文件。
│               所有代码变更的标准方式。
└── MULTI       N个Agent,每个领域一个。按专家 → 容量拆分。

                SINGLE用于狭窄的单领域变更;MULTI用于跨越多个专家的变更。行数不是衡量标准——按领域多样性拆分,而非文件数量。

REVIEW          审查代码变更。
├── NONE        跳过:变更类型=cosmetic且severity=none。
│               或:IMPLEMENT=NONE。
├── SINGLE      每个领域1个Agent。标准方式。
│               MEDIUM及以上严重程度:每个领域+1个第二意见Agent(并行)。
│               默认组合:code-reviewer(主) + 语言专家(第二意见)——规划者可根据任务上下文覆盖。
│               当任务跨越2个及以上使用**不同**专家的领域时,规划者在REVIEW批次中添加跨领域集成审查者(请参阅Boundary Selection Criteria进行分类——相同的ALWAYS/DEFAULT/SKIP层级适用)。这些Agent仅关注集成点:API契约、共享类型、领域间数据流以及实现变更在边界处的回归。不要重新审查领域内部逻辑。
│               实现后交叉审查至关重要:领域审查者认为新方法在其上下文中是正确的;只有追踪完整边界才能发现错误契约、数据格式或事务顺序与调用者预期不同的回归。跨领域集成审查的发现结果通过对抗性跨验证路由(每个CRITICAL/HIGH发现结果1:1,从两侧验证)。
└── MULTI       N个Agent,每个领域一个。

VERIFY          验证DISCOVER、REVIEW或修复后审查的发现结果。
                始终包括提取(1个Agent,默认模型)。当源阶段有第二意见Agent时,标记发现结果为"both-found"/"single-found";当存在交叉Agent时,标记为"boundary-found"/"domain-only"。
                按严重程度路由发现结果:
                
                CRITICAL/HIGH → 对抗性Agent(每个发现结果1个Agent —— 1:1)
                  对抗性Agent尝试证伪每个发现结果:读取引用代码及其完整上下文(至少30行),在各个层面全面搜索反证(相同函数保护、调用者级验证、框架级保护——中间件、装饰器、拦截器、全局错误处理、类型系统不变量、测试覆盖率)。用证据标记每个CONFIRMED / REJECTED /
                  WEAKENED。对于CONFIRMED:描述搜索的模式、运行的grep命令、未找到的原因。
                  对于REJECTED:粘贴带有file:line的确切反证代码。
                  对于WEAKENED:粘贴部分反证并解释原始主张的哪些部分仍然成立。
                  默认立场:假设主张的问题是误解,在确认前进行全面搜索。对于“缺少X”的发现结果,搜索X并在所有可达代码路径中未找到X是有效证据——记录所有搜索位置。通过全面证伪幸存的发现结果成为ADVERSARIAL VERIFIED。
                
                来自交叉或跨领域集成审查的CRITICAL/HIGH结果
                  (任何跨领域边界的发现结果,无论来自DISCOVER还是REVIEW) → 对抗性跨领域Agent
                  (每个发现结果1个Agent —— 1:1)。相同的全面证伪,但从集成边界的**双方**验证(领域A生产者 +
                  领域B消费者 + 两者之间的桥梁)。只有当双方或桥梁中都没有反证时,发现结果才会幸存。
                
                MEDIUM → 对抗性Agent(每5个发现结果1个Agent)
                  与CRITICAL/HIGH相同的全面证伪方法——读取引用代码及其完整上下文(至少30
                  行),在各个层面全面搜索反证(相同函数保护、调用者级验证、框架级保护、类型系统不变量、测试覆盖率)。用证据标记每个CONFIRMED / REJECTED / WEAKENED。默认立场:假设主张的问题是误解,在确认前进行全面搜索。
                  每个CONFIRMED标签都必须通过grep证据努力获得。
                
                LOW → NOTED。记录在报告中。不消耗更多Agent资源。
                
                路由后:合成(1个Agent,默认模型)将所有裁决编译为统一网格。显示"both-found"置信度信号。
                统一词汇:CONFIRMED / REJECTED / WEAKENED。
                还根据严重程度分类标准检查严重程度分配是否合理——如果发现结果的严重程度不匹配(例如“SQL注入”标记为MEDIUM),则标记为CHALLENGED。
                CHALLENGED发现结果重新通过对抗性验证路由。例外:文档领域的CHALLENGED发现结果跳过对抗性验证——文档严重程度本质上是主观的(“10个缺失的API文档”是HIGH还是MEDIUM?),对严重程度评级的对抗性审查不会增加有意义的验证。
                文档领域的CHALLENGED发现结果保持其挑战后的严重程度;主导者直接接受降级。
                提前退出:提取后0个结果 → 跳过合成。
                当DISCOVER、REVIEW或修复后审查产生带代码引用的发现结果时,始终运行。
                当存在MEDIUM及以上的CONFIRMED发现结果时,必须跟随FIX=DOMAINS。

CONVERGE        重复DISCOVER或REVIEW以进行额外的检查。规划者决定变体。
                因素:模糊性、代码库复杂性、发现结果数量、生产影响、变更类型、时间敏感性。
                NONE:一次检查。用于理解充分、范围狭窄的工作。也适用于测试覆盖率全面(>80%)且模块边界清晰的代码库——首次检查不太可能遗漏有意义的问题。
                ONCE:如果首次检查发现任何问题,则进行一次额外迭代(“发现任何问题”指任何迭代1的Agent报告了至少一个发现结果——无论是否通过对抗性验证;重点是不同的迭代2专家重新检查迭代1注意到的内容)。当规划者第一阶段研究发现互连模块、紧密耦合、非统一代码模式或每个领域>15K LOC时使用——这些特征表明首次检查可能遗漏问题。也用于严重程度为HIGH/CRITICAL的情况,无论代码库质量如何(遗漏发现结果代价高昂)。ONCE不是通用默认值——测试充分、结构清晰的代码库应使用NONE。
                LOOP:最多3次迭代,当报告为空时停止。用于高度模糊或生产关键工作,遗漏发现结果不可接受。
                迭代继承父阶段类型的所有强制规则(MEDIUM及以上的第二意见要求、分类边界的交叉Agent、DISCOVER/REVIEW → VERIFY管道等)。CONVERGE继承的交叉Agent是**额外**的Agent,而非替换——第一个DISCOVER阶段必须为ALWAYS/DEFAULT边界有自己的交叉Agent;
                CONVERGE迭代2添加具有不同专家的新交叉Agent。
                
                每次迭代有自己的VERIFY阶段。迭代1的VERIFY在迭代2生成**之前**运行——迭代1 VERIFY的合成网格决定是否生成迭代2(任何发现结果=生成)并为迭代2 Agent提供PRIOR CONTEXT。不要将两次迭代的验证合并到两次迭代完成后的单个阶段。计划结构必须是:
                  阶段N:   DISCOVER迭代1
                  阶段N+1: VERIFY迭代1
                  阶段N+2: DISCOVER迭代2(条件,来自N+1的PRIOR CONTEXT)
                  阶段N+3: VERIFY迭代2
                
                规划者必须列出每次迭代的所有Agent,且与前一次迭代的专家不同——主导者生成计划列出的所有Agent。在编写迭代2之前,规划者必须列出迭代1中使用的每个Agent `.md`文件,并将它们全部排除在迭代2之外——没有Agent可以在两次迭代中担任任何角色。
                在迭代之间交换主Agent和第二意见角色不算不同专家。使用相同的Agent `.md`文件对担任相反角色仍然是相同的分析框架。排除列表必须在计划中明确。

FIX             应用已验证的发现结果。始终为2-3个顺序阶段——包括修复后审查。
                当DOMAINS:每个领域1个修复Agent → 修复后REVIEW
                (与REVIEW阶段相同的变体/领域拆分),然后如果任何修复后审查报告包含至少一个MEDIUM及以上严重程度的发现结果,则进行VERIFY。发现结果是审查者报告中带有严重程度标签和代码引用(file:line、函数或块)的任何编号项。主导者不得重新分类、降级或排除发现结果——审查者记录的严重程度具有权威性。
                仅当所有修复后审查报告都包含零个MEDIUM+发现结果时,才跳过VERIFY。机械触发,无需判断。

                收敛:如果修复后VERIFY在合成网格中产生CONFIRMED MEDIUM+
                发现结果,则修复不完整。为已确认的发现结果生成新的修复过程(修复Agent → 修复后审查 → 条件验证)。重复此过程,直到修复后审查产生零个MEDIUM+发现结果并跳过VERIFY。FIX组件是收敛循环——当MEDIUM+发现结果通过验证时,一次修复永远不是最终的。标记为“待后续行动”的已记录发现结果仍然是未修复的MEDIUM+发现结果——现在修复,而非以后。
├── NONE        无已验证发现结果。
└── DOMAINS     每个领域1个修复Agent → 修复后REVIEW → 条件VERIFY。

TEST            运行构建+测试套件。始终为单个Agent,默认模型(机械操作)。
├── NONE        IMPLEMENT=NONE。或规划者有理由跳过(无测试基础设施)。
└── FULL        1个Agent。运行构建+测试,修复失败。
Severity Classification
严重程度分类
The planner assesses severity from code context — NOT keyword matching:
LevelCriteria
NoneNo functional impact. Comment, formatting, variable rename.
LowMinor, immediately reversible. Dev tooling, internal logging, tests.
MediumUser-facing, visible but contained. UI component, new endpoint, feature.
HighCore product function, data mutation, could break key flows. Payment, auth, database writes, primary user flows, data model changes.
CriticalProduct outage, data loss, severe production bugs, irreversible damage. Secret exposure, SQL injection, data deletion, auth bypass, production crash, corrupt state.
The planner reads the actual code, traces what it touches, and assigns severity based on actual product impact. No keyword auto-detection. A function named
validatePassword
that handles UI password strength is LOW, not HIGH. A log statement in a payment module is LOW unless the logging itself can break payments.
规划者根据代码上下文评估严重程度——基于关键字匹配:
级别标准
None无功能影响。注释、格式、变量重命名。
Low轻微,可立即逆转。开发工具、内部日志、测试。
Medium用户可见,影响有限。UI组件、新端点、功能。
High核心产品功能、数据变更,可能破坏关键流程。支付、认证、数据库写入、主要用户流程、数据模型变更。
Critical产品 outage、数据丢失、严重生产漏洞、不可逆损坏。密钥泄露、SQL注入、数据删除、认证绕过、生产崩溃、状态损坏。
规划者读取实际代码,追踪其影响范围,并根据实际产品影响分配严重程度。无关键字自动检测。名为
validatePassword
的处理UI密码强度的函数是LOW,而非HIGH。支付模块中的日志语句是LOW,除非日志本身会破坏支付。
Domain Splitting
领域拆分
When a task spans multiple domains, split in two steps. Domain breadth is measured by distinct specialist agents needed, not package count. A task touching 5 Swift packages that all use
swift-pro
is single-domain. A task touching Python + TypeScript files is few-domain.
  1. Split by specialist — map each file/concern to the best specialist agent from
    <skill-folder>/agents/INDEX.md
  2. Split by volume — keep each discovery agent to ~20 files and ~5K LOC. A narrow overage (up to 25 files / 6K LOC, in a single cohesive module — not unrelated files packed together) is acceptable; above that, split is mandatory. Discovery agents must read every file — a 20-line header costs the same context as a 200-line implementation file because the agent must understand the API and cross-reference every caller. After splitting, re-count each resulting sub-group to verify none still exceeds the limits.
    Post-split re-evaluation. After mandatory splits, verify the resulting agents are not fragmented. If any sub-agent has fewer than 15 files AND fewer than 3K LOC, the split produced an under-utilized agent — standalone agents this small add coordination overhead without proportional audit depth. Consider merging adjacent sub-agents: a single 40f/4K-LOC agent with "many thin boilerplate files" justification is better than two 20f/2K-LOC agents that have almost nothing to audit. When file count exceeds the 25f cap but total LOC is under 3K, the files are likely thin code-behind or utility stubs — prefer accepting as close call over splitting into fragments. The caps prevent agent overload, not create stand-alone agents that have too little to examine.
    The planner provides FILE SCOPES (module-level descriptions, e.g. "GPG core: core/GPGHandler.py, core/gpg_utils/*.py") with rough LOC estimates from Phase 1 research. The organizer resolves every scope to exact individual file paths (using glob + find + test -f), runs wc -l for exact counts, produces a systematic volume audit table comparing each domain against the ~20f / ~5K LOC baseline and the 25f / 6K LOC narrow cap, splits domains exceeding the cap, and writes the resolved KEY FILES + exact LOC counts into the plan file, preserving the planner's MUST ANSWER questions, domain descriptions, and agent assignments for each domain.
  3. Split implementation agents by edit density — different from discovery volume splitting. Sequential edits on the same file accumulate context pressure linearly (agent re-reads, re-edits, re-tests the same code) causing edit amnesia: the agent forgets it already applied a change and tries to re-apply it. Two mechanical caps, counted from the synthesis grid's confirmed MEDIUM+ findings:
    • Per-file cap: no single file may carry more than 8 confirmed MEDIUM+ findings to one implementation agent. If a file exceeds 8, split that file's fixes across 2 agents by finding index.
    • Per-agent cap: no implementation agent may receive more than 12 confirmed MEDIUM+ findings across all files. If a domain exceeds 12 total, split into 2 agents by file/module.
当任务跨越多个领域时,分两步拆分。领域广度按所需的不同专家Agent数量衡量,而非包数量。 涉及5个都使用
swift-pro
的Swift包的任务是单领域。涉及Python + TypeScript文件的任务是少数领域。
  1. 按专家拆分 —— 将每个文件/关注点映射到
    <skill-folder>/agents/INDEX.md
    中的最佳专家Agent
  2. 按容量拆分 —— 每个发现Agent处理约20个文件和约5K LOC。轻微超出(最多25个文件/6K LOC,在单个内聚模块中——而非无关文件打包在一起)是可接受的;超出此范围则必须拆分。发现Agent必须读取每个文件——20行的头文件与200行的实现文件消耗相同的上下文,因为Agent必须理解API并交叉引用每个调用者。拆分后,重新计算每个子组以验证没有子组仍超出限制。
    拆分后重新评估。 强制拆分后,验证生成的Agent 没有碎片化。如果任何子Agent处理少于15个文件且少于3K LOC, 拆分产生了未充分利用的Agent——如此小的独立Agent会增加协调开销,而没有相应的审计深度。考虑合并相邻的子Agent:单个处理40个文件/4K-LOC的Agent(理由是“许多薄样板文件”)比两个处理20个文件/2K-LOC的Agent更好,后者几乎没有可审计的内容。当文件数量超过25个上限但总LOC低于3K时,文件可能是薄代码后置或实用工具存根——优先接受临界情况而非拆分为碎片。上限是为了防止Agent过载,而非创建检查内容过少的独立Agent。
    规划者提供FILE SCOPES(模块级描述,例如“GPG核心: core/GPGHandler.py, core/gpg_utils/*.py”),并带有第一阶段研究的大致LOC估计。组织者将每个范围解析为确切的单个文件路径(使用glob + find + test -f),运行wc -l获取确切计数,生成系统容量审计表,将每个领域与~20个文件/~5K LOC基线和25个文件/6K LOC临界上限进行比较,拆分超出上限的领域,并将解析后的KEY FILES + 确切LOC计数写入计划文件, 保留规划者的MUST ANSWER问题、领域描述和每个领域的Agent分配。
  3. 按编辑密度拆分实现Agent —— 与发现容量拆分不同。同一文件上的顺序编辑线性累积上下文压力(Agent重新读取、重新编辑、重新测试相同代码)导致编辑遗忘:Agent忘记已应用更改并尝试重新应用。两个机械上限,根据合成网格中的已确认MEDIUM+发现结果计数:
    • 每个文件上限: 单个文件不得向一个实现Agent分配超过8个已确认MEDIUM+发现结果。如果文件超过8个,按发现结果索引将该文件的修复拆分给2个Agent。
    • 每个Agent上限: 一个实现Agent不得接收超过12个跨所有文件的已确认MEDIUM+发现结果。如果领域总数超过12个,按文件/模块拆分为2个Agent。
Boundary Selection for Intersection Agents
交叉Agent的边界选择
The planner identifies domain adjacencies during Phase 1 research, assesses coupling density by tracing cross-boundary call sites, and classifies each boundary. This decision is documented in the plan manifest under "Boundary Analysis."
TierCriteriaAction
ALWAYSTwo persistence mechanisms at boundary; OR data format/encoding transformation at boundary; OR error contract differs from caller expectation; OR 5+ cross-boundary call sites across 3+ modulesAdd intersection agent to DISCOVER and REVIEW
DEFAULTMultiple cross-boundary call sites; moderate coupling; multi-module boundaryAdd intersection agent to DISCOVER and REVIEW
SKIPBoundary bridged by a single well-understood mediator class; OR <3 cross-boundary call sites; OR well-documented established pattern (e.g., standard library protocol layer)Skip — domain primaries + second opinions sufficient
SKIP boundaries require a one-line justification in the plan's Boundary Analysis (e.g., "SKIP: Crypto×Network — thin boundary bridged by MailCore2 TLS").
Rationale (from Run 4 empirical data):
Intersection agents at high-coupling boundaries produce unique MEDIUM+ findings at ~1.4 agents per unique finding. At thin boundaries bridged by a single mediator class, intersection agents add near-zero unique value (<20% precision, 0 unique findings in Run 4). Triaging prevents wasteful agent spend at boundaries where domain primaries and second opinions already provide sufficient coverage.
Academic support: Koru et al. (2007) established that highly coupled modules are more defect-prone. Zhou et al. (2020) confirmed package coupling metrics predict defect-proneness. An empirical study of interaction bugs in ROS-based software (2025) found failures "often manifest at the boundaries between components."
规划者在第一阶段研究中识别领域邻接关系,通过追踪跨边界调用点评估耦合密度,并对每个边界进行分类。此决策记录在计划清单的“Boundary Analysis”下。
层级标准行动
ALWAYS边界处有两种持久化机制;或边界处有数据格式/编码转换;或错误契约与调用者预期不同;或3个及以上模块中有5个及以上跨边界调用点在DISCOVER和REVIEW中添加交叉Agent
DEFAULT多个跨边界调用点;中等耦合;多模块边界在DISCOVER和REVIEW中添加交叉Agent
SKIP边界由单个易于理解的中介类桥接;或<3个跨边界调用点;或有完善文档的既定模式(例如标准库协议层)跳过——领域主Agent + 第二意见足够
SKIP边界需要在计划的Boundary Analysis中提供单行理由(例如“SKIP:Crypto×Network —— 由MailCore2 TLS桥接的薄边界”)。
理由(来自Run 4实证数据):
高耦合边界的交叉Agent每1.4个Agent产生一个独特的MEDIUM+发现结果。在由单个中介类桥接的薄边界处,交叉Agent几乎没有独特价值(精度<20%,Run 4中0个独特发现结果)。分类可防止在领域主Agent和第二意见已提供足够覆盖的边界处浪费Agent资源。
学术支持: Koru等人(2007)确定高度耦合的模块更容易出现缺陷。Zhou等人(2020)证实包耦合指标可预测缺陷倾向。2025年对基于ROS的软件中交互漏洞的实证研究发现,故障“通常出现在组件之间的边界处”。
Size Classification
规模分类
The planner assesses scope along with severity. Size gates DISCOVER=NONE decisions.
SizeCriteria
tinySingle file, single change, under 10 lines. Trivial fix, no structural impact.
smallSingle module, few files. Well-scoped change with clear boundaries. Under ~20 files and ~5K LOC.
mediumMultiple modules, cross-file changes. Moderate scope, may touch different concerns. Under ~20 files and ~5K LOC.
largeExceeds ~20 files OR ~5K LOC in any domain, OR spans multiple specialist domains (different languages/frameworks). Requires volume splitting.
DISCOVER=NONE requires
size=tiny
(nothing to discover) OR
size=small
with planner-identified root cause at file:line. For
medium
and
large
, DISCOVER is mandatory.
规划者评估范围和严重程度。规模决定DISCOVER=NONE的决策。
规模标准
tiny单个文件,单个变更,少于10行。琐碎修复,无结构影响。
small单个模块,少数文件。范围明确的变更,边界清晰。少于20个文件和5K LOC。
medium多个模块,跨文件变更。中等范围,可能涉及不同关注点。少于20个文件和5K LOC。
large任何领域超过20个文件或5K LOC,或跨越多个专家领域(不同语言/框架)。需要容量拆分。
DISCOVER=NONE需要
size=tiny
(无内容可发现)或
size=small
且规划者已识别file:line级的根本原因。对于
medium
large
,DISCOVER是强制的。
Mid-Execution Amendment
执行中修正
After VERIFY produces confirmed findings at MEDIUM severity or above: if the manifest does not include IMPLEMENT, the lead auto-adds IMPLEMENT followed by FIX (which includes internal post-fix REVIEW + conditional VERIFY). This is unconditional — all confirmed MEDIUM+ findings are fixed regardless of task intent. LOW findings are reported but not auto-fixed.
When auto-adding IMPLEMENT or planning implementation stages from the synthesis grid, count confirmed MEDIUM+ findings per file. Apply the edit-density split (Domain Splitting step 3): if any single file carries more than 8 findings or any domain carries more than 12 total findings, split that domain's implementation into 2 agents.
After a FIX stage's post-fix VERIFY produces CONFIRMED MEDIUM+ findings in the synthesis grid: auto-add another FIX pass (fix agents → post-fix review → conditional verify). This repeats until post-fix review produces zero MEDIUM+ findings and VERIFY is skipped. This is mechanical — the FIX brick is a convergence loop, and surviving MEDIUM+ findings mean the fix was incomplete. IMPLEMENT already being in the manifest does not block this — FIX convergence re-entry is independent of the IMPLEMENT amendment.
Implementation stages use write → review structure:
  Stage N: Implementation — 1 agent per domain
    Agent writes code directly to original files.
  Stage N+1: Review — 1 agent per domain
    Reviews the implementation for bugs, quality, correctness.
  Stage N+2: Verification — severity-routed (extraction → adversarial [CRITICAL/HIGH 1:1, MEDIUM 1 per 5] → synthesis)
Fix agents (docs, configs, scripts): use default model agents for code. Split fixes by domain — one agent per domain. Every fix stage MUST be followed by a post-fix review:
  Stage N: Fixes — N agents split by domain
  Stage N+1: Post-fix review — N agents (1 per domain)
  Stage N+2: Verification — severity-routed (only if fix review found MEDIUM+ findings)
Delegation mapping (MANDATORY in every plan): During planning you MUST answer:
  1. What subtasks exist? (list each one)
  2. Which agent handles each subtask? (map agent name to subtask — consult
    <skill-folder>/agents/INDEX.md
    )
  3. Where is verification in this plan? Confirm verification runs after every DISCOVER and REVIEW stage that produces findings, or mark it explicitly as SKIPPED with justification.
Answer these explicitly in your plan. Every subtask must have an assigned agent — no subtask goes to the lead.
Stage decomposition rule (MANDATORY): If stage N+1 does NOT consume stage N's verified output — they're independent — MERGE them into a single stage with parallel agents. Sequential stages are only correct when the next stage actually needs the previous stage's verified findings as
PRIOR CONTEXT:
.
Write full plan to
tmp/glm-plan.md
. The
-m
flag on
spawn-glm.sh
is available to override when a specific model is needed. Quick-fix agents (see Lead Role) are always single-model but run outside the plan's stage structure — they handle agent output issues within an existing workflow, never as a standalone workflow replacement. Checkpoint.
Dependency analysis (MANDATORY — lead's responsibility, before spawning): Before spawning any stage, the lead builds a dependency graph of agents within that stage:
  1. For each agent, list files it will READ and files it will WRITE/CREATE
  2. If Agent B reads or tests a file that Agent A writes → B depends on A → they CANNOT run in parallel
  3. Split into batches: independent agents run together, dependent agents run sequentially
  4. Document in
    tmp/glm-plan.md
    per stage:
  Stage N agents:
    Batch 1 (parallel): agent-a (writes X.swift), agent-b (writes Y.swift)
    Batch 2 (after batch 1): agent-c (tests X.swift, depends on agent-a)
Common dependency patterns to watch: test-writer depends on implementer, fix-agent depends on reviewer, integration-tester depends on all implementers, plan organizer depends on the planner's output. When in doubt, sequence — wasted time from a retry loop exceeds the cost of sequential execution.
Session start: Clean ALL stale workflow artifacts. Use two steps — explicit files first (shell-safe), then wildcard patterns via
find
(avoids zsh glob errors when no files match a pattern):
  1. rm -f tmp/glm-plan.md
  2. find tmp/ -maxdepth 1 \( -name 'stage-*-synthesis.md' -o -name 'stage-*-iter-*-synthesis.md' -o -name 's[0-9]*-task.txt' -o -name 's[0-9]*-prompt.txt' -o -name 's[0-9]*-status.txt' -o -name 's[0-9]*-report.md' -o -name 'plan-review-*' \) -delete
  3. Verify:
    ls tmp/
    — confirm no stale workflow artifacts remain. If any survived, remove them manually before proceeding.
Also clear stale session checkpoints:
echo "# Session Memory" > session.md
CAUTION: Never use broad patterns like
tmp/*-report.md
or
tmp/*-log.txt
— they will delete non-workflow files (e.g.
log-analysis-report.md
). Never delete
tmp/loop-runs/
— this directory contains permanent loop run logs and must be preserved across sessions. Agent names follow
s{digit}...
prefix (e.g.
s1-researcher
,
s2i1-reviewer-r2
), so
tmp/s[0-9]*
safely matches only workflow artifacts.
Session boundaries: Each session is independent — treat every task as a fresh start. Do not assume prior sessions' findings still hold. Every code change, even from previous sessions, requires fresh verification through the full workflow. Only reference prior sessions when the task explicitly asks you to. If task will likely need >4 stages, plan explicit session splits using the continuation protocol. Long sessions degrade from compaction pressure.
VERIFY产生MEDIUM及以上严重程度的已确认发现结果后:如果清单不包含IMPLEMENT,主导者自动添加IMPLEMENT,随后是FIX(包括内部修复后REVIEW + 条件VERIFY)。这是无条件的——所有已确认的MEDIUM+发现结果无论任务意图如何都必须修复。LOW发现结果报告但不自动修复。
自动添加IMPLEMENT或从合成网格规划实现阶段时,按文件计数已确认的MEDIUM+发现结果。应用编辑密度拆分(领域拆分步骤3):如果任何单个文件超过8个发现结果或任何领域超过12个总发现结果,将该领域的实现拆分为2个Agent。
FIX阶段的修复后VERIFY在合成网格中产生CONFIRMED MEDIUM+发现结果后:自动添加另一个FIX过程(修复Agent → 修复后审查 → 条件验证)。重复此过程,直到修复后审查产生零个MEDIUM+发现结果并跳过VERIFY。这是机械操作——FIX组件是收敛循环,幸存的MEDIUM+发现结果意味着修复不完整。清单中已存在IMPLEMENT不阻止此操作——FIX收敛重新进入独立于IMPLEMENT修正。
实现阶段使用编写 → 审查结构:
  阶段N: 实现 —— 每个领域1个Agent
    Agent直接将代码写入原始文件。
  阶段N+1: 审查 —— 每个领域1个Agent
    审查实现的漏洞、质量、正确性。
  阶段N+2: 验证 —— 按严重程度路由(提取 → 对抗性验证 [CRITICAL/HIGH 1:1,MEDIUM每5个1个] → 合成)
修复Agent(文档、配置、脚本):使用默认模型Agent处理代码。按领域拆分修复——每个领域一个Agent。每个FIX阶段必须跟随修复后审查:
  阶段N: 修复 —— 按领域拆分的N个Agent
  阶段N+1: 修复后审查 —— N个Agent(每个领域1个)
  阶段N+2: 验证 —— 按严重程度路由(仅当修复审查发现MEDIUM+发现结果时)
委托映射(计划中强制执行): 规划期间必须回答:
  1. 存在哪些子任务?(列出每个子任务)
  2. 哪个Agent处理每个子任务?(将Agent名称映射到子任务——参考
    <skill-folder>/agents/INDEX.md
  3. 此计划中的验证在哪里?确认验证在每个产生发现结果的DISCOVER和REVIEW阶段后运行,或明确标记为SKIPPED并说明理由。
在计划中明确回答这些问题。每个子任务必须分配Agent——没有子任务由主导者处理。
阶段分解规则(强制执行): 如果阶段N+1不消耗阶段N的已验证输出——它们是独立的——合并为单个并行Agent阶段。顺序阶段仅当下一阶段实际需要前一阶段的已验证发现结果作为
PRIOR CONTEXT:
时才正确。
将完整计划写入
tmp/glm-plan.md
spawn-glm.sh
-m
标志可用于覆盖默认模型,当需要特定模型时。快速修复Agent(请参阅主导者角色)始终使用单个模型,但在计划的阶段结构之外运行——它们处理现有工作流中的Agent输出问题,永远不能作为独立工作流替代品。创建检查点。
依赖分析(强制执行——主导者的责任,生成Agent前): 在生成任何阶段的Agent之前,主导者构建该阶段内Agent的依赖图:
  1. 对于每个Agent,列出它将读取的文件和将写入/创建的文件
  2. 如果Agent B读取或测试Agent A写入的文件 → B依赖于A → 它们不能并行运行
  3. 拆分为批次:独立Agent一起运行,依赖Agent顺序运行
  4. tmp/glm-plan.md
    中按阶段记录:
  阶段N的Agent:
    批次1(并行): agent-a(写入X.swift), agent-b(写入Y.swift)
    批次2(批次1之后): agent-c(测试X.swift,依赖于agent-a)
常见依赖模式:测试编写者依赖于实现者,修复Agent依赖于审查者,集成测试者依赖于所有实现者,计划组织者依赖于规划者的输出。如有疑问,顺序执行——重试循环浪费的时间超过顺序执行的成本。
会话开始: 清理所有过期工作流工件。分两步——先清理明确的文件(shell安全),然后通过
find
使用通配符模式(避免zsh glob在无匹配文件时出错):
  1. rm -f tmp/glm-plan.md
  2. find tmp/ -maxdepth 1 \( -name 'stage-*-synthesis.md' -o -name 'stage-*-iter-*-synthesis.md' -o -name 's[0-9]*-task.txt' -o -name 's[0-9]*-prompt.txt' -o -name 's[0-9]*-status.txt' -o -name 's[0-9]*-report.md' -o -name 'plan-review-*' \) -delete
  3. 验证:
    ls tmp/
    —— 确认无过期工作流工件剩余。如果有任何幸存,手动删除后再继续。
还清除过期会话检查点:
echo "# Session Memory" > session.md
注意:切勿使用
tmp/*-report.md
tmp/*-log.txt
等宽泛模式——它们会删除非工作流文件(例如
log-analysis-report.md
)。切勿删除
tmp/loop-runs/
——此目录包含永久循环运行日志,必须跨会话保留。Agent名称遵循
s{digit}...
前缀(例如
s1-researcher
,
s2i1-reviewer-r2
),因此
tmp/s[0-9]*
安全匹配仅工作流工件。
会话边界: 每个会话独立——将每个任务视为全新开始。不要假设先前会话的发现结果仍然有效。每次代码更改,即使来自先前会话,都必须通过完整工作流进行新的验证。仅当任务明确要求时才参考先前会话。如果任务可能需要>4个阶段,计划使用续接协议进行明确的会话拆分。长会话会因压缩压力而降级。

Agent Preparation

Agent准备

Consult
<skill-folder>/agents/INDEX.md
for the full agent directory (110+ agents grouped by domain). Pick the MOST specialized agent (see Agent Selection above) — a PostgreSQL task should use postgres-pro, not database-optimizer. The agent's domain checklists and anti-patterns are the primary value — they only work when the agent matches the domain.
For each agent in the current stage:
  1. Define task with KEY FILES, CONTEXT, SCOPE,
    WRITABLE FILES
    (code agents only — list source files agent may edit), and
    MUST ANSWER:
    questions (provided by the planner in the manifest — mandatory, prompts without these are invalid). The lead may add 1-2 supplementary workflow-level questions (e.g., "Was the linter run?") but does not write code-level technical questions.
  2. Write the TASK ASSIGNMENT block (PROJECT, ENVIRONMENT if code, PRIOR CONTEXT if stage 2+, YOUR TASK, WRITABLE FILES) to
    tmp/{name}-task.txt
    . NOTE: Do NOT include the report file path in WRITABLE FILES — the script auto-injects
    tmp/{NAME}-report.md
    automatically.
  3. Assemble the full prompt:
    bash
    <skill-folder>/tools/assemble-prompt.sh -a AGENT -t TYPE -n NAME --task tmp/{name}-task.txt
    Types:
    review
    (coordination-review + severity + quality-rules-review),
    code
    (coordination-code + quality-rules-code),
    research
    (coordination-review + quality-rules-review). The script reads the agent .md, selects templates, substitutes
    {NAME}
    in the task file content, and writes
    tmp/{name}-prompt.txt
    . Output:
    ASSEMBLED|name|path|bytes
  4. Validate prompt contains ALL: full agent .md, TASK ASSIGNMENT with MUST ANSWER questions, quality rules, severity guide (review only), environment (code only), coordination, report format. The script handles all boilerplate automatically — you only own the task file. Missing ANY = do not spawn
  5. Match agent type to task: REVIEW → code-reviewer, security-reviewer, backend-architect. CODE → language-pro, debugger. Git/history analysis (blame, log, diff, tracing fixes through commits) →
    debugger
    or
    research-analyst
  6. WRITABLE FILES: Code agents: task file MUST list the exact source files/directories the agent may modify. Review/audit/research agents: omit WRITABLE FILES entirely — the script auto-injects the correct report path and marks all source files as read-only.
    • Implementation agents: WRITABLE FILES must list the exact source files the agent may modify directly. The task must instruct them to produce their implementation and run any available lint/test commands to verify correctness. The task MUST also instruct them to write an Intent section in their report before coding: a description of their understanding of the task and their intended approach, in their own words, at whatever level of detail they think is useful for the reviewer. The agent decides what to communicate — architectural reasoning, assumptions about the codebase, trade-offs considered, alternatives rejected, or anything else that helps someone else understand why they built what they built. This is the first thing they write, before any code.
    • Review/audit/research agents: omit WRITABLE FILES entirely — the script auto-injects the correct report path and marks all source files as read-only. Describe problems and desired behavior — do NOT paste exact fix code unless precision is critical (regex, API signatures, security logic). Name agents with stage prefix:
      s1-researcher
      ,
      s2-impl-auth
      .
参考
<skill-folder>/agents/INDEX.md
获取完整Agent目录(110+个Agent按领域分组)。选择最专业的Agent(请参阅上文的Agent选择)——PostgreSQL任务应使用postgres-pro,而非database-optimizer。Agent的领域检查清单和反模式是主要价值——仅当Agent匹配领域时才生效。
对于当前阶段的每个Agent:
  1. 定义任务,包含KEY FILES、CONTEXT、SCOPE、
    WRITABLE FILES
    (仅代码Agent——列出Agent可编辑的源文件)和
    MUST ANSWER:
    问题(规划者在清单中提供——强制执行,无这些问题的提示无效)。主导者可以添加1-2个补充工作流级问题(例如“是否运行了linter?”),但不编写代码级技术问题。
  2. 将TASK ASSIGNMENT块(PROJECT、ENVIRONMENT(如果是代码任务)、PRIOR CONTEXT(如果是阶段2+)、YOUR TASK、WRITABLE FILES)写入
    tmp/{name}-task.txt
    。注意:不要在WRITABLE FILES中包含报告文件路径——脚本会自动注入
    tmp/{NAME}-report.md
  3. 组装完整提示:
    bash
    <skill-folder>/tools/assemble-prompt.sh -a AGENT -t TYPE -n NAME --task tmp/{name}-task.txt
    类型:
    review
    (coordination-review + severity + quality-rules-review)、
    code
    (coordination-code + quality-rules-code)、
    research
    (coordination-review + quality-rules-review)。脚本读取Agent的.md文件、选择模板、替换任务文件内容中的
    {NAME}
    ,并写入
    tmp/{name}-prompt.txt
    。输出:
    ASSEMBLED|name|path|bytes
  4. 验证提示包含所有内容: 完整的Agent.md文件、带MUST ANSWER问题的TASK ASSIGNMENT、质量规则、严重程度指南(仅审查)、环境(仅代码)、协调、报告格式。脚本自动处理所有样板内容——你仅负责任务文件。缺少任何内容 = 不生成Agent
  5. 匹配Agent类型与任务:REVIEW → code-reviewer、security-reviewer、backend-architect。CODE → language-pro、debugger。Git/历史分析(blame、log、diff、通过提交追踪修复) →
    debugger
    research-analyst
  6. WRITABLE FILES: 代码Agent:任务文件必须列出Agent可修改的确切源文件/目录。审查/审计/研究Agent:完全省略WRITABLE FILES——脚本会自动注入正确的报告路径并将所有源文件标记为只读。
    • 实现Agent: WRITABLE FILES必须列出Agent可直接修改的确切源文件。任务必须指示他们生成实现并运行任何可用的lint/测试命令以验证正确性。任务还必须指示他们在编码前在报告中写入Intent部分:用自己的话描述他们对任务的理解和预期方法,详细程度由他们决定,只要有助于他人理解他们构建内容的原因。这是他们编写的第一件事,在任何代码之前。
    • 审查/审计/研究Agent: 完全省略WRITABLE FILES——脚本会自动注入正确的报告路径并将所有源文件标记为只读。 描述问题和期望行为——除非精度至关重要(正则表达式、API签名、安全逻辑),否则不要粘贴确切的修复代码。用阶段前缀命名Agent:
      s1-researcher
      ,
      s2-impl-auth

Agent Spawning

Agent生成

The
-m
flag is available to override when a specific model is needed but is never required.
How it works for review/research/audit stages:
  1. A single agent gets the agent
    .md
    and the task assignment — it works independently
  2. When a stage has independent subtasks (different files, modules, concerns), spawn one agent per subtask in parallel — as many as the task naturally decomposes into, maximum 10 agents
  3. Each agent's report feeds into the verification pipeline (see Verification section)
  4. Naming convention:
    sN-name
    , e.g.
    s1-reviewer
    ,
    s2i1-researcher
    (stage 2, iteration 1)
How it works for implementation stages:
  1. Write step: A single agent writes the implementation directly to the original files. The agent reads the full task, understands the requirements, and produces a complete implementation.
  2. Review step: A single review agent reviews the implementation — same task description, independent assessment.
  3. Fix and iterate: The review report is processed by the verification pipeline to produce a verified checklist. ALL verified findings are fixed via fix-agents split by domain. The lead does NOT fix findings directly, regardless of how few or how trivial. Every fix MUST be followed by a post-fix review agent. Every review MUST be followed by verification — review findings are not deliverable until they've been verified. The review → fix → re-review loop iterates until the review agent produces no new findings (empty report) — this convergence is the final gate.
Spawn:
bash
undefined
-m
标志可用于覆盖默认模型,当需要特定模型时,但并非必需。
审查/研究/审计阶段的工作方式:
  1. 单个Agent获取Agent.md文件和任务分配——独立工作
  2. 当阶段有独立子任务(不同文件、模块、关注点)时,每个子任务并行生成一个Agent——与任务自然分解数量相同,最多10个Agent
  3. 每个Agent的报告输入到验证管道(请参阅Verification部分)
  4. 命名约定:
    sN-name
    ,例如
    s1-reviewer
    ,
    s2i1-researcher
    (阶段2,迭代1)
实现阶段的工作方式:
  1. 编写步骤: 单个Agent直接将实现写入原始文件。Agent读取完整任务、理解要求并生成完整实现。
  2. 审查步骤: 单个审查Agent审查实现——相同任务描述,独立评估。
  3. 修复和迭代: 审查报告通过验证管道处理以生成已验证清单。所有已验证发现结果通过按领域拆分的修复Agent修复。主导者不得直接修复发现结果,无论数量多少或多么琐碎。每个修复必须跟随修复后审查Agent。每个审查必须跟随验证——审查发现结果在验证前不可交付。审查 → 修复 → 重新审查循环迭代,直到审查Agent生成无新发现结果的报告(空报告)——此收敛是最终门限。
生成Agent:
bash
undefined

Single agent (uses default model)

单个Agent(使用默认模型)

<skill-folder>/tools/spawn-glm.sh -n s1-reviewer -f tmp/s1-reviewer-prompt.txt
<skill-folder>/tools/spawn-glm.sh -n s1-reviewer -f tmp/s1-reviewer-prompt.txt

Override model

覆盖模型

<skill-folder>/tools/spawn-glm.sh -n s1-reviewer -f tmp/s1-reviewer-prompt.txt -m zai/glm-5.1

**Prompt assembly:** Assemble ONE prompt per agent via `assemble-prompt.sh`:
```bash
<skill-folder>/tools/assemble-prompt.sh -a AGENT -t TYPE -n NAME --task tmp/task.txt
Types:
review
(coordination-review + severity + quality-rules-review),
code
(coordination-code + quality-rules-code),
research
(coordination-review + quality-rules-review).
Implementation spawn pattern:
bash
undefined
<skill-folder>/tools/spawn-glm.sh -n s1-reviewer -f tmp/s1-reviewer-prompt.txt -m zai/glm-5.1

**提示组装:** 通过`assemble-prompt.sh`为每个Agent组装一个提示:
```bash
<skill-folder>/tools/assemble-prompt.sh -a AGENT -t TYPE -n NAME --task tmp/task.txt
类型:
review
(coordination-review + severity + quality-rules-review)、
code
(coordination-code + quality-rules-code)、
research
(coordination-review + quality-rules-review)。
实现生成模式:
bash
undefined

Write step

编写步骤

<skill-folder>/tools/spawn-glm.sh -n sN-impl -f tmp/sN-impl-prompt.txt
<skill-folder>/tools/spawn-glm.sh -n sN-impl -f tmp/sN-impl-prompt.txt

Review step (spawn AFTER write completes)

审查步骤(编写完成后生成)

<skill-folder>/tools/spawn-glm.sh -n sN-review -f tmp/sN-review-prompt.txt

**Naming convention overview:**
- Plan: `s0-planner`, `s0-organize`
- Discovery: `sN-discover-{domain}`, `sN-discover-2-{domain}` (second opinion),
  `sN-discover-{domainA}-{domainB}` (intersection, e.g., `s1-discover-crypto-services`)
- Implementation: `sN-impl-{domain}`, `sN-review-{domain}`, `sN-review-2-{domain}` (second opinion),
  `sN-review-{domainA}-{domainB}` (intersection, e.g., `s6-review-crypto-services`)
- Verification: `sN-extract`, `sN-adv-{domain}` (adversarial — 1:1 for CRITICAL/HIGH, 1 per 5 for MEDIUM), `sN-adv-cross` (cross-domain adversarial), `sN-synth`
- Fix: `sN-fix-{domain}`
- Test: `sN-test`
- Iterations: `s{N}i{K}-name` (e.g., `s2i1-researcher`, `s2i2-researcher`)
- Respawns: add `-r2`, `-r3` suffix when re-spawning a failed agent with corrected configuration (e.g., `s2i1-reviewer-r2` = stage 2 iteration 1 reviewer, respawn attempt 2). Maximum 3 respawn attempts per agent.
<skill-folder>/tools/spawn-glm.sh -n sN-review -f tmp/sN-review-prompt.txt

**命名约定概述:**
- 计划:`s0-planner`, `s0-organize`
- 发现:`sN-discover-{domain}`, `sN-discover-2-{domain}`(第二意见),
  `sN-discover-{domainA}-{domainB}`(交叉,例如`s1-discover-crypto-services`)
- 实现:`sN-impl-{domain}`, `sN-review-{domain}`, `sN-review-2-{domain}`(第二意见),
  `sN-review-{domainA}-{domainB}`(交叉,例如`s6-review-crypto-services`)
- 验证:`sN-extract`, `sN-adv-{domain}`(对抗性——CRITICAL/HIGH为1:1,MEDIUM为每5个1个), `sN-adv-cross`(跨领域对抗性), `sN-synth`
- 修复:`sN-fix-{domain}`
- 测试:`sN-test`
- 迭代:`s{N}i{K}-name`(例如`s2i1-researcher`, `s2i2-researcher`)
- 重新生成:添加`-r2`, `-r3`后缀,当使用更正的配置重新生成失败的Agent时(例如`s2i1-reviewer-r2` = 阶段2迭代1审查者,重新生成尝试2)。每个Agent最多3次重新生成尝试。

Second Opinion Guidelines

第二意见指南

For DISCOVERY and REVIEW stages at MEDIUM+ severity, spawn a second opinion agent using a different agent
.md
from the INDEX. The two agents review the same code but through different analytical frameworks, producing complementary findings (proven: 87% complementarity across 5 language domains across 3 languages; 4-agent audit confirmed each additional agent type finds structurally distinct issues). PLAN always has an agent-organizer review (mandatory, all tasks) — see Planning phase step 3b. Agent selection is task-driven — the tables below show recommended defaults; the planner selects the best agents for the specific task based on codebase context.
No domain exception: The documentation-domain exceptions (skipping adversarial verification, accepting challenged downgrades directly) apply ONLY to the verification pipeline — how findings are routed and verified. They do NOT excuse documentation-domain DISCOVERY or REVIEW stages from the second-opinion requirement. MEDIUM+ severity → second opinion is unconditional across all domains. If a task is MEDIUM+ and includes documentation as a domain, the discovery and review stages for that domain MUST include a second opinion agent.
对于MEDIUM及以上严重程度的DISCOVERY和REVIEW阶段,生成使用INDEX中不同Agent.md文件的第二意见Agent。两个Agent审查相同代码,但通过不同的分析框架,产生互补的发现结果(已验证:5个语言领域、3种语言中87%的互补性;4-Agent审计确认每个额外Agent类型发现结构不同的问题)。PLAN始终有agent-organizer审查(强制执行,所有任务)——请参阅规划阶段步骤3b。Agent选择由任务驱动——下表显示推荐默认值;规划者根据代码库上下文为特定任务选择最佳Agent。
无领域例外: 文档领域例外(跳过对抗性验证、直接接受挑战降级)仅适用于验证管道——发现结果如何路由和验证。它们不免除文档领域DISCOVERY或REVIEW阶段的第二意见要求。MEDIUM及以上严重程度 → 第二意见在所有领域都是无条件的。如果任务是MEDIUM+且包含文档作为领域,该领域的发现和审查阶段必须包含第二意见Agent。

DISCOVER pairings (defaults — planner may override)

DISCOVER配对(默认值——规划者可覆盖)

For DISCOVER, the primary agent is typically the domain specialist who audits existing code. The second opinion is typically a code-reviewer providing a general quality lens. The planner may select different agents when the task warrants it — the table shows recommended defaults, not hard assignments.
ContextPrimarySecond Opinion
General codedomain specialist (
python-pro
,
swift-pro
, etc.)
code-reviewer
Auth/crypto
security-reviewer
code-reviewer
Infrastructure/config
devops-engineer
code-reviewer
Trivial / single-domain-smallskip— (only when overall task severity < MEDIUM; the MEDIUM+ severity rule — "second opinion mandatory in all DISCOVER stages" — overrides this row)
对于DISCOVER,主Agent通常是审计现有代码的领域专家。第二意见通常是提供通用质量视角的code-reviewer。当任务需要时,规划者可以选择不同的Agent——下表显示推荐默认值,而非硬性分配。
上下文主Agent第二意见
通用代码领域专家(
python-pro
,
swift-pro
等)
code-reviewer
认证/加密
security-reviewer
code-reviewer
基础设施/配置
devops-engineer
code-reviewer
琐碎/单领域小型跳过—(仅当整体任务严重程度<MEDIUM时;MEDIUM+严重程度规则——“所有DISCOVER阶段必须有第二意见”——覆盖此行)

REVIEW pairings (defaults — planner may override)

REVIEW配对(默认值——规划者可覆盖)

For REVIEW, the primary agent is typically a code-reviewer assessing implementation quality. The second opinion varies by context to provide a complementary lens. The planner may select different agents when the task warrants it — the table shows recommended defaults, not hard assignments.
ContextPrimarySecond Opinion
General code
code-reviewer
language specialist (
python-pro
,
swift-pro
, etc.)
Auth/crypto
code-reviewer
security-reviewer
Infrastructure/config
code-reviewer
devops-engineer
System design / architecture
code-reviewer
backend-architect
Multi-language
code-reviewer
backend-architect
(prefer splitting into per-language reviews with individual second opinions)
Trivial / single-domain-smallskip— (only when overall task severity < MEDIUM; the MEDIUM+ severity rule — "second opinion mandatory in all REVIEW stages" — overrides this row)
Same-agent prohibition: The second opinion agent MUST use a different
.md
file from the primary. Using the same agent
.md
twice — even with "different task scoping" — does not create a different analytical framework. Same checklists, same anti-patterns, same blind spots. The 87% complementarity effect depends on genuinely different agent expertise. If no different specialist can be found for a second opinion, split the review into smaller per-domain reviews where each can get a truly different second opinion.
Task-framing guideline: The task file for the second opinion agent uses the same KEY FILES as the primary but may add a domain-specific emphasis directive in the YOUR TASK section. Example: for
python-pro
reviewing OAuth code, add "Pay special attention to Python error handling patterns around I/O, binary data decoding, and data class validation." This costs zero tokens and amplifies the complementarity effect.
Both-found confidence signal: When a DISCOVERY or REVIEW stage used a second opinion, the subsequent extraction agent tags each finding as "both-found" (both agents reported independently) or "single-found" (only one agent reported). Both-found findings carry higher confidence — surface this in the synthesis grid.
对于REVIEW,主Agent通常是评估实现质量的code-reviewer。第二意见根据上下文变化以提供互补视角。当任务需要时,规划者可以选择不同的Agent——下表显示推荐默认值,而非硬性分配。
上下文主Agent第二意见
通用代码
code-reviewer
语言专家(
python-pro
,
swift-pro
等)
认证/加密
code-reviewer
security-reviewer
基础设施/配置
code-reviewer
devops-engineer
系统设计/架构
code-reviewer
backend-architect
多语言
code-reviewer
backend-architect
(优先拆分为按语言审查,每个有独立第二意见)
琐碎/单领域小型跳过—(仅当整体任务严重程度<MEDIUM时;MEDIUM+严重程度规则——“所有REVIEW阶段必须有第二意见”——覆盖此行)
禁止相同Agent: 第二意见Agent必须使用与主Agent不同的.md文件。即使“任务范围不同”,重复使用相同的Agent.md文件也不会创建不同的分析框架。相同的检查清单、相同的反模式、相同的盲点。87%的互补效应依赖于真正不同的Agent专业知识。如果找不到不同的专家作为第二意见,将审查拆分为更小的按领域审查,每个审查可以获得真正不同的第二意见。
任务框架指南: 第二意见Agent的任务文件使用与主Agent相同的KEY FILES,但可以在YOUR TASK部分添加领域特定的强调指令。示例:对于
python-pro
审查OAuth代码,添加“特别注意Python围绕I/O、二进制数据解码和数据类验证的错误处理模式。”这不会消耗token,并增强互补效应。
both-found置信度信号: 当DISCOVERY或REVIEW阶段使用了第二意见时,后续的提取Agent将每个发现结果标记为“both-found”(两个Agent独立报告)或“single-found”(仅一个Agent报告)。both-found发现结果具有更高的置信度——在合成网格中显示此信号。

Execution

执行

  1. Spawn current batch of agents via
    spawn-glm.sh
    , respecting the per-batch limit from Tools and the dependency analysis above. If stdout is empty (Windows
    .cmd
    issue), read
    tmp/{NAME}-status.txt
    to get PID. Checkpoint with PIDs and names. If stage has multiple batches, wait for current batch to finish before spawning next
  2. wait-glm.sh name1:$PID1 name2:$PID2 ...
    — first progress at 30s, then every 60s, STALLED warnings, health check on finish
  3. Do verification prep (for VERIFY stages): read the extraction agent's output, create verification task files per batch, assemble prompts. Batch cross-check (MANDATORY): Before spawning, verify that every batch the extraction report prescribes has a corresponding task file, and each task file targets the exact finding IDs from the extraction's batch assignment table (e.g., ADV-1 → B1-B4). A task file for different findings than prescribed does not satisfy the batch assignment. The extraction report is authoritative — the lead does NOT substitute finding targets.
  4. Review output. Check operational status only — was the report produced? Is the log non-empty? Any STALLED markers? This is NOT quality review (do NOT evaluate findings, accuracy, or correctness). If ANY agent shows STALLED / EMPTY LOG / MISSING REPORT / EMPTY REPORT:
    • Diagnose root cause. Fix the issue (environment, prompt, task file, dependencies).
    • Re-spawn the agent with corrected configuration.
    • Do NOT proceed to the next stage with incomplete stage output.
    • Accept a gap and proceed ONLY for trivial gaps in discovery stages (e.g. a single agent in a 10-agent discovery stage failed after 3 respawn attempts with different approaches, AND its domain is partially covered by other agents). Every such decision must be explicitly justified in
      tmp/glm-plan.md
      with
      STAGE GAP ACCEPTED: [domain] [reason] [coverage from other agents]
      . Do NOT accept gaps in implementation or fix stages — those stages must produce complete, correct output. Do NOT silently skip failed agents.
  1. 通过
    spawn-glm.sh
    生成当前批次的Agent,遵守工具和上述依赖分析中的每批次限制。如果stdout为空(Windows
    .cmd
    问题),读取
    tmp/{NAME}-status.txt
    获取PID。用PID和名称创建检查点。如果阶段有多个批次,等待当前批次完成后再生成下一批次
  2. wait-glm.sh name1:$PID1 name2:$PID2 ...
    —— 30秒首次进度,随后每60秒,STALLED警告,完成时健康检查
  3. 准备验证(对于VERIFY阶段):读取提取Agent的输出,为每个批次创建验证任务文件,组装提示。批次交叉检查(强制执行): 生成前,验证提取报告规定的每个批次都有对应的任务文件,且每个任务文件针对提取批次分配表中的确切发现ID(例如ADV-1 → B1-B4)。针对规定以外的发现结果的任务文件不满足批次分配要求。提取报告具有权威性——主导者不得替换发现目标。
  4. 审查输出。 仅检查运行状态——是否生成了报告?日志是否非空?是否有任何STALLED标记?这不是质量审查(不要评估发现结果、准确性或正确性)。如果任何Agent显示STALLED / EMPTY LOG / MISSING REPORT / EMPTY REPORT:
    • 诊断根本原因。修复问题(环境、提示、任务文件、依赖项)。
    • 使用更正的配置重新生成Agent。
    • 不要在阶段输出不完整的情况下进入下一阶段。
    • 仅在发现阶段的琐碎间隙时接受间隙并继续(例如10-Agent发现阶段中的单个Agent在3次不同方法的重新生成尝试后失败,且其领域被其他Agent部分覆盖)。每个此类决策必须在
      tmp/glm-plan.md
      中明确说明理由,格式为
      STAGE GAP ACCEPTED: [domain] [reason] [coverage from other agents]
      。不要接受实现或修复阶段的间隙——这些阶段必须生成完整、正确的输出。不要静默跳过失败的Agent。

Verification

验证

Verification uses the severity-routed verification pipeline. The lead does NOT manually verify findings — that's the agents' job. The pipeline runs in batches with sequential dependencies:
Batch 0: Extraction agent (single, default model; use
research-analyst
agent
.md
). Reads all reports from the stage, extracts every finding with file:line and severity, deduplicates (same file:line + same issue → merge, note both sources), classifies each finding by severity, and splits into batches grouped by domain. When the originating stage (DISCOVERY or REVIEW) used a second opinion agent, tag each finding as "both-found" (both agents reported independently) or "single-found" (one agent only). When intersection agents were present, also tag findings as "boundary-found" (reported by an intersection agent auditing a domain boundary — inherently invisible to within-domain specialists) or "domain-only" (reported only by domain primaries/second opinions). Both-found and boundary-found carry elevated confidence for different reasons: both-found signals cross-agent agreement within a domain; boundary-found signals issues spanning domains that no within-domain specialist could have detected. A finding that is both "both-found" AND "boundary-found" carries the highest confidence. Surface all tags in synthesis. Findings from documentation specialist agents (documentation-pro) are domain-verified — route them directly to synthesis at the agent's rated severity, skipping adversarial verification.
Mechanical trigger — MANDATORY: If extraction finds any finding at MEDIUM severity or above, the lead MUST spawn ALL verification batches the extraction report prescribes — every adversarial batch, at the exact finding IDs listed in the extraction's batch assignment table. Spawning an adversarial agent against different findings than prescribed does NOT satisfy this trigger. The synthesis agent runs after all routing agents complete — even if every routed finding was REJECTED or WEAKENED. The lead does NOT evaluate routing agent outputs to decide whether synthesis is needed. The synthesis grid — not the lead's judgment — determines which findings are fixed. Skipping verification for MEDIUM+ findings is a protocol violation.
Batch 1: Findings routed by severity. All findings extracted by Batch 0 are routed:
  • CRITICAL/HIGH findings → Adversarial agent (single agent per finding (1:1), default model). Tries to FALSIFY every finding: reads cited code with full surrounding context, exhaustively searches for counter-evidence (guards, validation, framework protections, type system invariants, test coverage), labels each CONFIRMED / REJECTED / WEAKENED with evidence. Adversarial methodology: assume the claimed issue is a misunderstanding and search exhaustively before confirming. Every CONFIRMED label must be hard-won with grep evidence.
  • CRITICAL/HIGH findings from intersection or cross-domain integration review (any finding spanning domain boundaries, from DISCOVER or REVIEW) → Adversarial cross-domain agent (single agent per finding (1:1), default model). Same exhaustive falsification but verifies from BOTH sides of the integration boundary (Domain A producer + Domain B consumer + bridge between them). Finding only survives if no counter-evidence on either side or in the bridge.
  • MEDIUM findings → Adversarial agent (single agent per batch of 5 findings, default model). Same exhaustive falsification methodology as CRITICAL/HIGH — reads cited code with full surrounding context, exhaustively searches for counter-evidence (guards, validation, framework protections, type system invariants, test coverage), labels each CONFIRMED / REJECTED / WEAKENED with evidence. Adversarial methodology: assume the claimed issue is a misunderstanding and search exhaustively before confirming. Every CONFIRMED label must be hard-won with grep evidence.
  • LOW findings → NOTED. Recorded in the report. No further agent spend.
Batch 2: Synthesis agent (single, default model; use
research-analyst
agent
.md
). Reads all verdicts. Builds a cross-reference grid per finding using unified vocabulary:
CONFIRMEDREJECTEDWEAKENED
→ fix list→ droppedseverity downgraded → fix list at lower priority
Surfaces "both-found" confidence signals from extraction — findings reported by both primary and second opinion agents carry higher initial confidence.
Also sanity-checks severity assignments against the severity classification criteria — if a finding's severity appears mismatched (e.g., "SQL injection" labeled MEDIUM), flag it as CHALLENGED. Challenged findings are re-routed through adversarial verification. Exception: documentation-domain challenged findings skip adversarial — documentation severity is inherently subjective (is "10 missing API docs" HIGH or MEDIUM?) and adversarial review of severity ratings adds no meaningful verification. Documentation-domain challenged findings stay at their challenged severity; the lead accepts the downgrade directly.
If the synthesis grid shows zero CONFIRMED findings at MEDIUM or above (all MEDIUM+ findings were REJECTED, all were DROPPED, or only LOW-severity survivors remain), FIX is SKIPPED — there is nothing significant to fix. LOW verified findings are acknowledged in the synthesis as non-blocking. The lead writes the synthesis with
FIX SKIPPED: Zero MEDIUM+ verified findings — nothing to fix.
This is mechanical — no lead judgment.
Verification is MANDATORY after every discovery, review (including cross-domain integration review), and post-fix review stage that produces code-referencing findings with file:line references. Exception: stages producing findings without code-level references (web research, pure analysis, documentation reviews) — lead may mark verification as SKIPPED with explicit justification.
Verification completion checklist — MANDATORY before marking a stage as done:
  1. Extraction agent spawned and report produced
  2. If extraction found 0 findings → stage complete (early-exit)
  3. If extraction found MEDIUM+ findings: a. ALL adversarial batches from extraction's batch assignment table spawned — cross-check each ADV task file's finding IDs against the prescribed batch:finding mapping b. Synthesis agent spawned — compiles grid, sanity-checks severity c. Synthesis grid determines FIX=SKIPPED or FIX follows Skipping any step when MEDIUM+ findings exist is a protocol violation.
Verification naming convention:
  • Extraction:
    sN-extract
  • Adversarial pairs:
    sN-adv-{domain}
    (single agent per finding for CRITICAL/HIGH — 1:1; single agent per batch of 5 for MEDIUM)
  • Adversarial cross:
    sN-adv-cross
    (single agent per finding — 1:1)
  • Synthesis:
    sN-synth
验证使用按严重程度路由的验证管道。主导者不得手动验证发现结果——这是Agent的工作。管道按批次运行,具有顺序依赖关系:
批次0:提取Agent(单个,默认模型;使用
research-analyst
Agent的.md)。读取阶段的所有报告,提取每个带file:line和严重程度的发现结果,去重(相同file:line + 相同问题 → 合并,注明两个来源),按严重程度分类每个发现结果,并按领域分组为批次。当源阶段(DISCOVERY或REVIEW)使用了第二意见Agent时,将每个发现结果标记为“both-found”(两个Agent独立报告)或“single-found”(仅一个Agent报告)。当存在交叉Agent时,还将发现结果标记为“boundary-found”(由审计领域边界的交叉Agent报告——领域内专家无法发现)或“domain-only”(仅由领域主Agent/第二意见报告)。both-found和boundary-found因不同原因具有更高的置信度:both-found表示领域内跨Agent一致;boundary-found表示跨领域的问题,任何领域内专家都无法检测到。同时是“both-found”和“boundary-found”的发现结果具有最高置信度。在合成中显示所有标签。文档专家Agent(documentation-pro)的发现结果已通过领域验证——直接按Agent评级的严重程度路由到合成,跳过对抗性验证。
机械触发——强制执行: 如果提取发现任何MEDIUM及以上严重程度的结果,主导者必须生成提取报告规定的所有验证批次——每个对抗性批次,针对提取批次分配表中列出的确切发现ID。针对规定以外的发现结果生成对抗性Agent不满足此触发条件。合成Agent在所有路由Agent完成后运行——即使所有路由结果都被REJECTED或WEAKENED。主导者不得评估路由Agent的输出以决定是否需要合成。合成网格——而非主导者的判断——决定哪些发现结果需要修复。跳过MEDIUM+发现结果的验证是违反协议的行为。
批次1:按严重程度路由发现结果。 批次0提取的所有发现结果都被路由:
  • CRITICAL/HIGH发现结果 → 对抗性Agent(每个发现结果一个Agent(1:1),默认模型)。尝试证伪每个发现结果:读取引用代码及其完整上下文,全面搜索反证(保护、验证、框架保护、类型系统不变量、测试覆盖率),用证据标记每个CONFIRMED / REJECTED / WEAKENED。对抗性方法:假设主张的问题是误解,在确认前进行全面搜索。每个CONFIRMED标签都必须通过grep证据努力获得。
  • 来自交叉或跨领域集成审查的CRITICAL/HIGH发现结果(任何跨领域边界的发现结果,来自DISCOVER或REVIEW) → 对抗性跨领域Agent(每个发现结果一个Agent(1:1),默认模型)。同样的全面证伪,但从集成边界的双方验证(领域A生产者 + 领域B消费者 + 两者之间的桥梁)。只有当双方或桥梁中都没有反证时,发现结果才会幸存。
  • MEDIUM发现结果 → 对抗性Agent(每5个发现结果一个Agent,默认模型)。与CRITICAL/HIGH发现结果相同的全面证伪方法——读取引用代码及其完整上下文,全面搜索反证(保护、验证、框架保护、类型系统不变量、测试覆盖率),用证据标记每个CONFIRMED / REJECTED / WEAKENED。对抗性方法:假设主张的问题是误解,在确认前进行全面搜索。每个CONFIRMED标签都必须通过grep证据努力获得。
  • LOW发现结果 → NOTED。记录在报告中。不消耗更多Agent资源。
批次2:合成Agent(单个,默认模型;使用
research-analyst
Agent的.md)。读取所有裁决结果。使用统一词汇为每个发现结果构建交叉引用网格:
CONFIRMEDREJECTEDWEAKENED
→ 修复列表→ 丢弃严重程度降级 → 低优先级修复列表
显示提取中的“both-found”置信度信号——由主Agent和第二意见Agent报告的发现结果具有更高的初始置信度。
还根据严重程度分类标准检查严重程度分配是否合理——如果发现结果的严重程度不匹配(例如“SQL注入”标记为MEDIUM),则标记为CHALLENGED。CHALLENGED发现结果重新通过对抗性验证路由。例外:文档领域的CHALLENGED发现结果跳过对抗性验证——文档严重程度本质上是主观的(“10个缺失的API文档”是HIGH还是MEDIUM?),对严重程度评级的对抗性审查不会增加有意义的验证。文档领域的CHALLENGED发现结果保持其挑战后的严重程度;主导者直接接受降级。
如果合成网格显示没有MEDIUM及以上的CONFIRMED发现结果(所有MEDIUM+发现结果都被REJECTED,所有都被丢弃,或仅存LOW严重程度结果),则FIX SKIPPED——没有重要内容需要修复。LOW验证发现结果在合成中被确认为非阻塞。主导者编写合成结果时注明
FIX SKIPPED: Zero MEDIUM+ verified findings — nothing to fix.
这是机械操作——无需主导者判断。
验证是强制执行的,在每个产生带file:line引用的代码参考发现结果的发现、审查(包括跨领域集成审查)和修复后审查阶段之后。例外:产生无代码级引用的发现结果的阶段(Web研究、纯分析、文档审查)——主导者可以明确说明理由后标记验证为SKIPPED。
验证完成清单——标记阶段完成前强制执行:
  1. 生成提取Agent并产生报告
  2. 如果提取发现0个结果 → 阶段完成(提前退出)
  3. 如果提取发现MEDIUM+结果: a. 提取批次分配表中的所有对抗性批次都已生成——交叉检查每个ADV任务文件的发现ID与规定的批次:发现映射 b. 生成合成Agent——编译网格、检查严重程度合理性 c. 合成网格决定FIX=SKIPPED或FIX跟随 当存在MEDIUM+结果时跳过任何步骤是违反协议的行为。
验证命名约定:
  • 提取:
    sN-extract
  • 对抗性配对:
    sN-adv-{domain}
    (CRITICAL/HIGH为每个发现结果一个Agent——1:1;MEDIUM为每5个一个Agent)
  • 对抗性跨领域:
    sN-adv-cross
    (每个发现结果一个Agent——1:1)
  • 合成:
    sN-synth

Between Stages

阶段间处理

  1. Write
    tmp/stage-N-synthesis.md
    — verified results from the synthesis grid, decisions, context for next stage
  2. Mid-execution amendment (new findings): If VERIFY produces confirmed findings at MEDIUM severity or above and IMPLEMENT is NOT in the manifest, the lead auto-adds IMPLEMENT followed by FIX (always 2-3 sequential stages: fix + post-fix review + conditional VERIFY). This is unconditional — all confirmed MEDIUM+ findings are fixed regardless of task intent. LOW findings are reported but not auto-fixed. This is mechanical — verify the condition, add the stages. FIX convergence (incomplete fixes): After a FIX stage's post-fix VERIFY produces CONFIRMED MEDIUM+ findings in the synthesis grid, auto-add another FIX pass regardless of whether IMPLEMENT is already in the manifest. IMPLEMENT presence does not block FIX convergence — surviving MEDIUM+ findings mean the fix was incomplete. Repeat until post-fix review produces zero MEDIUM+ findings and VERIFY is skipped.
  3. If scope changed from original plan, update
    tmp/glm-plan.md
    with actual stages and revised goals
  4. Checkpoint. Clean up:
    rm -f tmp/sN-*-prompt.txt tmp/sN-*-task.txt
  5. Next stage prompts include synthesis as
    PRIOR CONTEXT:
    section. PRIOR CONTEXT should contain only factual project context the next stage needs: what was discovered, what was decided, what constraints exist, what was already fixed. Do NOT include verification process details, rejected findings, or behavioral instructions — these compete with the agent .md. Target under 50 lines
  6. Never re-do verified work unless evidence shows it was wrong
  7. Never skip a planned stage without explicitly marking it in
    tmp/glm-plan.md
    as
    SKIPPED
    with a reason. A stage is only complete when its agents have been spawned, waited, their reports processed by the verification pipeline, and findings verified — incomplete stages cannot be proceeded past, outside the narrow gap-acceptance rules in Execution step 4. PLAN stages cannot be SKIPPED for speed or token savings — only for genuine blockers (environment failure, missing files, corrupted state).
  8. After writing synthesis, read
    tmp/glm-plan.md
    to confirm the next stage. If the plan has remaining stages, execute them — do not deliver early unless remaining stages are explicitly marked SKIPPED.
Iterative stages: Between iterations, follow the Iterative Convergence protocol below — skip steps 1-5 until convergence is reached. On convergence, write final stage synthesis (step 1) and resume normal between-stages flow (steps 2-5).
  1. 编写
    tmp/stage-N-synthesis.md
    ——合成网格的已验证结果、决策、下一阶段的上下文
  2. 执行中修正(新发现结果): 如果VERIFY产生MEDIUM及以上严重程度的已确认发现结果且清单中没有IMPLEMENT,主导者自动添加IMPLEMENT,随后是FIX(始终为2-3个顺序阶段:修复 + 修复后审查 + 条件VERIFY)。这是无条件的——所有已确认的MEDIUM+发现结果无论任务意图如何都必须修复。LOW发现结果报告但不自动修复。这是机械操作——验证条件,添加阶段。 FIX收敛(修复不完整): FIX阶段的修复后VERIFY在合成网格中产生CONFIRMED MEDIUM+发现结果后,无论清单中是否已有IMPLEMENT,自动添加另一个FIX过程。IMPLEMENT的存在不阻止FIX收敛——幸存的MEDIUM+发现结果意味着修复不完整。重复此过程,直到修复后审查产生零个MEDIUM+发现结果并跳过VERIFY。
  3. 如果范围与原始计划不同,更新
    tmp/glm-plan.md
    中的实际阶段和修订目标
  4. 创建检查点。清理:
    rm -f tmp/sN-*-prompt.txt tmp/sN-*-task.txt
  5. 下一阶段的提示包含合成结果作为
    PRIOR CONTEXT:
    部分。PRIOR CONTEXT应仅包含下一阶段需要的事实项目上下文:发现了什么、决定了什么、存在哪些约束、已修复什么。不要包含验证过程细节、被拒绝的发现结果或行为指令——这些会与Agent.md竞争。目标少于50行
  6. 除非有证据表明已验证的工作错误,否则永远不要重新做已验证的工作
  7. 不要跳过计划阶段,除非在
    tmp/glm-plan.md
    中明确标记为
    SKIPPED
    并说明理由。只有当Agent已生成、等待完成、其报告通过验证管道处理、发现结果已验证时,阶段才完成——除了执行步骤4中的窄间隙接受规则外,不完整的阶段不能继续。PLAN阶段不能为了速度或节省token而SKIPPED——仅当存在真正的阻塞(环境故障、文件缺失、状态损坏)时才可SKIPPED。
  8. 编写合成结果后,读取
    tmp/glm-plan.md
    确认下一阶段。如果计划有剩余阶段,执行它们——不要提前交付,除非剩余阶段明确标记为SKIPPED。
迭代阶段: 迭代之间,遵循下文的Iterative Convergence协议——跳过步骤1-5直到收敛。收敛后,编写最终阶段合成结果(步骤1)并恢复正常阶段间流程(步骤2-5)。

Iterative Convergence

迭代收敛

Some stages benefit from repeated runs until agents stop producing new meaningful output. What counts as "new output" depends on the stage purpose — new problems (audit), new information (research), new improvements (analysis), new risks (security), etc.
Convergence is mechanical: when ALL agents in an iteration produce zero new findings (empty reports, no new issues found), the stage has converged. A single non-empty report means the iteration produced output — iterate again. The lead does not subjectively judge whether findings are "meaningful enough" — any finding is meaningful.
Planner-decided, not mandatory. The planner selects NONE / ONCE / LOOP per stage based on task characteristics:
  • NONE: One pass. For well-understood, narrow work. Also appropriate for codebases with comprehensive test coverage (>80%) and clean module boundaries — first pass is unlikely to miss meaningful issues.
  • ONCE: One extra iteration if first pass found anything ("found anything" means any iter 1 agent reported at least one finding — regardless of whether it survived adversarial verification; the point is different iter 2 specialists re-examine what iter 1 noticed). Use when the planner's Phase 1 research reveals interconnected modules, dense coupling, non-uniform code patterns, or >15K LOC per domain — characteristics suggesting a first pass may miss issues. Also used when severity is HIGH/CRITICAL regardless of codebase quality (missed findings are expensive). ONCE is NOT the universal default — well-tested, cleanly-structured codebases should use NONE.
  • LOOP: Up to 3 iterations, stop on empty report. For highly ambiguous or production-critical work where missed findings would be unacceptable.
Factors the planner considers: ambiguity, codebase complexity, finding volume from first pass, production impact of missed findings, change type (exploratory vs. mechanical), time sensitivity.
Not used for: Production stages (implementation and fixing) and verification stages. These produce or evaluate output rather than discovering issues.
Mandatory rules apply: CONVERGE iterations of DISCOVERY or REVIEW stages inherit ALL mandatory rules from the parent stage type — including second-opinion requirements at MEDIUM+ severity (see Second Opinion Guidelines). When the original DISCOVER/REVIEW required a second opinion agent, every CONVERGE iteration must also include a second opinion. The planner's decision table must list all agents to spawn per iteration — the lead spawns exactly what the plan lists.
Execution is mechanical — the lead does NOT re-evaluate the CONVERGE decision. If the plan says ONCE and verified findings exist, the lead spawns the iteration agents unconditionally. If the plan says NONE, the lead skips unconditionally. The planner's assessment of codebase characteristics (test coverage, coupling, module density, severity) was already baked into the plan during Phase 1 research. The lead does NOT substitute judgment based on what findings happened to be confirmed — whether findings appear "isolated" or "specific" is the planner's call at plan time, not the lead's call at execution time. The planner sees the full codebase structure during research; the lead only sees post-hoc finding counts.
Mechanics:
  1. Each iteration = full prepare → spawn → verify cycle
  2. After verification: check reports mechanically — any non-empty finding list in any agent report?
    • Yes (any finding produced) → write iteration synthesis to
      tmp/stage-N-iter-K-synthesis.md
      , prepare next iteration with cumulative context from all prior iterations
    • No (all reports empty, zero findings) → convergence reached; write final stage synthesis and move on
  3. Lead SHOULD vary approach between iterations — different agents, focus areas, or angles — to avoid blind spots. Running identical agents repeatedly is wasteful.
  4. Lead can adjust agent count and type between iterations based on what prior iterations revealed
  5. If iteration cap hit without convergence → synthesize what's known, note "convergence not reached" in delivery, proceed
  6. Naming: iteration agents follow
    s{N}i{K}-name
    — e.g.
    s2i1-reviewer
    ,
    s2i2-researcher
    (stage 2, iteration 1/2). Respawn within iteration:
    s2i1-reviewer-r2
    .
VERIFY between iterations (MANDATORY): The plan must include a VERIFY stage between every pair of CONVERGE iterations. The structure is: Stage N: DISCOVER iter 1 Stage N+1: VERIFY iter 1 (extraction → adversarial → synthesis) Stage N+2: DISCOVER iter 2 (conditional on N+1 synthesis, PRIOR CONTEXT from N+1) Stage N+3: VERIFY iter 2 Iter 1's VERIFY produces the synthesis grid that (a) determines whether iter 2 spawns (any finding = spawn) and (b) provides PRIOR CONTEXT for iter 2 agents. Merging both iterations' verification into one stage after both complete is a protocol violation — there is no way to know whether iter 2 should spawn, and no PRIOR CONTEXT for iter 2 without iter 1's synthesis first.
某些阶段重复运行直到Agent停止产生新的有意义输出会受益。“新输出”的定义取决于阶段目的——新问题(审计)、新信息(研究)、新改进(分析)、新风险(安全)等。
收敛是机械的:当迭代中的所有Agent都产生零个新发现结果(空报告,未发现新问题)时,阶段收敛。单个非空报告意味着迭代产生了输出——再次迭代。主导者不得主观判断发现结果是否“足够有意义”——任何发现结果都是有意义的。
由规划者决定,非强制执行。 规划者根据任务特征为每个阶段选择NONE / ONCE / LOOP:
  • NONE:一次检查。用于理解充分、范围狭窄的工作。也适用于测试覆盖率全面(>80%)且模块边界清晰的代码库——首次检查不太可能遗漏有意义的问题。
  • ONCE:如果首次检查发现任何问题,则进行一次额外迭代(“发现任何问题”指任何迭代1的Agent报告了至少一个发现结果——无论是否通过对抗性验证;重点是不同的迭代2专家重新检查迭代1注意到的内容)。当规划者第一阶段研究发现互连模块、紧密耦合、非统一代码模式或每个领域>15K LOC时使用——这些特征表明首次检查可能遗漏问题。也用于严重程度为HIGH/CRITICAL的情况,无论代码库质量如何(遗漏发现结果代价高昂)。ONCE不是通用默认值——测试充分、结构清晰的代码库应使用NONE。
  • LOOP:最多3次迭代,当报告为空时停止。用于高度模糊或生产关键工作,遗漏发现结果不可接受。
规划者考虑的因素:模糊性、代码库复杂性、首次检查的发现结果数量、遗漏发现结果的生产影响、变更类型(探索性 vs. 机械性)、时间敏感性。
不适用于: 生产阶段(实现和修复)和验证阶段。这些阶段产生或评估输出,而非发现问题。
强制规则适用: DISCOVERY或REVIEW阶段的CONVERGE迭代继承父阶段类型的所有强制规则——包括MEDIUM及以上严重程度的第二意见要求(请参阅Second Opinion Guidelines)。当原始DISCOVER/REVIEW需要第二意见Agent时,每个CONVERGE迭代也必须包含第二意见。规划者的决策表必须列出每次迭代要生成的所有Agent——主导者生成计划列出的所有Agent。
执行是机械的——主导者不得重新评估CONVERGE决策。 如果计划说ONCE且存在已验证发现结果,主导者无条件生成迭代Agent。如果计划说NONE,主导者无条件跳过。规划者在第一阶段研究中已将代码库特征(测试覆盖率、耦合、模块密度、严重程度)纳入计划。主导者不得根据已确认的发现结果替代判断——发现结果是否“孤立”或“特定”是规划者在计划时的决定,而非主导者在执行时的决定。规划者在研究时看到完整的代码库结构;主导者仅看到事后的发现结果计数。
机制:
  1. 每次迭代 = 完整的准备 → 生成 → 验证周期
  2. 验证后:机械检查报告——任何Agent报告中有非空发现结果列表吗?
    • (产生任何发现结果) → 将迭代合成结果写入
      tmp/stage-N-iter-K-synthesis.md
      ,使用所有先前迭代的累积上下文准备下一迭代
    • (所有报告为空,零发现结果) → 达到收敛;编写最终阶段合成结果并继续
  3. 主导者应在迭代之间改变方法——不同的Agent、关注点或角度——以避免盲点。重复运行相同的Agent是浪费的。
  4. 主导者可以根据先前迭代揭示的内容调整迭代之间的Agent数量和类型
  5. 如果达到迭代上限仍未收敛 → 合成已知内容,在交付中注明“未达到收敛”,继续
  6. 命名: 迭代Agent遵循
    s{N}i{K}-name
    格式——例如
    s2i1-reviewer
    ,
    s2i2-researcher
    (阶段2,迭代1/2)。迭代内重新生成:
    s2i1-reviewer-r2
迭代间VERIFY(强制执行): 计划必须在每对CONVERGE迭代之间包含VERIFY阶段。结构为: 阶段N: DISCOVER迭代1 阶段N+1: VERIFY迭代1 (提取 → 对抗性验证 → 合成) 阶段N+2: DISCOVER迭代2(取决于N+1的合成结果,来自N+1的PRIOR CONTEXT) 阶段N+3: VERIFY迭代2 迭代1的VERIFY生成合成网格,(a) 决定是否生成迭代2(任何发现结果=生成),(b) 为迭代2 Agent提供PRIOR CONTEXT。将两次迭代的验证合并到两次迭代完成后的单个阶段是违反协议的行为——无法知道是否应生成迭代2,且没有迭代1的合成结果就没有迭代2的PRIOR CONTEXT。

Delivery

交付

Before delivery: Read
tmp/glm-plan.md
. Confirm every planned stage is complete or explicitly marked SKIPPED with justification. A stage silently skipped = not delivered yet. Execute it or update the plan. If any code was changed during the fix stage — by fix-agents — confirm that post-fix review and verification both ran (verification runs only if review found new findings). Code changes without downstream verification are not deliverable. The user's task instructions (commit, push, report) are the final step after all stages complete — they do not override the mandatory stages that must run first.
Before delivery, mechanically verify all mid-execution decisions:
  • If any conditional VERIFY was skipped: read the stage's review reports. If any report contains a MEDIUM+ finding with a code reference, the VERIFY stage must be run now.
  • If any finding was marked as dropped or noted by the lead without routing through the verification pipeline: the finding must be routed through the verification pipeline now.
After final stage:
  • Reviews/audits: write report to
    tmp/
    with verified findings, rejected items, gaps
  • Code changes: spawn a single agent (default model) to run build + tests, fix all failures, and deliver production-ready result. Lead chooses the exact agent for the job (e.g. debugger, build-error-resolver, cpp-pro). This is the final production gate.
  • Research/analysis: synthesize into clear summary
  • Write
    tmp/session-summary.md
    : task goal, stages executed, total agents, agent aborts/failures, iterations per iterative stage, verification stats, key decisions, phase durations (planning, preparation, execution/wait, verification, synthesis)
  • Cleanup:
    rm -f tmp/s[0-9]*-prompt.txt tmp/s[0-9]*-task.txt
    . Keep logs, reports, summary
  • Save workflow lessons to knowledge if applicable
交付前: 读取
tmp/glm-plan.md
。确认每个计划阶段都已完成或明确标记为SKIPPED并说明理由。静默跳过的阶段 = 尚未交付。执行该阶段或更新计划。如果修复阶段中代码被更改(由修复Agent),确认修复后审查和验证都已运行(仅当审查发现新结果时才运行验证)。未经过下游验证的代码更改不可交付。用户的任务指令(提交、推送、报告)是所有阶段完成后的最后一步——它们不覆盖必须先运行的强制阶段。
交付前,机械验证所有执行中决策:
  • 如果任何条件VERIFY被跳过:读取阶段的审查报告。 如果任何报告包含带代码引用的MEDIUM+发现结果,必须立即运行VERIFY阶段。
  • 如果任何发现结果被主导者标记为丢弃或记录,但未通过验证管道路由:必须立即通过验证管道路由该发现结果。
最终阶段后:
  • 审查/审计: 将报告写入
    tmp/
    ,包含已验证发现结果、被拒绝项、间隙
  • 代码更改: 生成单个Agent(默认模型)运行构建+测试,修复所有失败,并交付生产就绪结果。主导者选择适合该工作的精确Agent(例如debugger、build-error-resolver、cpp-pro)。这是最终生产门限。
  • 研究/分析: 合成为清晰的摘要
  • 编写
    tmp/session-summary.md
    :任务目标、执行的阶段、总Agent数、Agent中止/失败、迭代阶段的迭代次数、验证统计、关键决策、阶段持续时间(规划、准备、执行/等待、验证、合成)
  • 清理:
    rm -f tmp/s[0-9]*-prompt.txt tmp/s[0-9]*-task.txt
    。保留日志、报告、摘要
  • 如果适用,将工作流经验保存到knowledge

Agent Prompt Template

Agent提示模板

Prompts are assembled with cache-aware ordering: stable shared content first (cached across calls), volatile per-instance content last. The assembly order:
You are a single agent working solo. Do all the work yourself — do not spawn sub-agents, do not delegate to other agents, do not run agentic workflows. Agentic workflows are not allowed in this session.

Before claiming something is missing or broken — grep for existing guards, handlers, or implementations first.

{cat <skill-folder>/templates/coordination-review.txt OR coordination-code.txt — replace {NAME}}

{cat <skill-folder>/templates/severity-guide.txt — REVIEW/audit tasks only}

{cat <skill-folder>/templates/quality-rules-review.txt OR quality-rules-code.txt}

{Full <skill-folder>/agents/{agent}.md — see Rules → Prompts}

You are an AI agent named {NAME}.

--- TASK ASSIGNMENT ---

PROJECT: {working directory and project description}

ENVIRONMENT (code tasks only):
{Runtime, test command, lint command}

PRIOR CONTEXT (stage 2+ or iteration 2+):
{Contents of tmp/stage-N-synthesis.md OR cumulative tmp/stage-N-iter-*-synthesis.md for iterations}

YOUR TASK: {KEY FILES, CONTEXT, SCOPE, MUST ANSWER questions}

WRITABLE FILES: {code agents only — list source files agent may edit. Review/research/audit agents: omit this section}
Task TypeCoordinationSeverity GuideQuality Rules
Review/auditcoordination-review.txtseverity-guide.txtquality-rules-review.txt
Code/refactorcoordination-code.txtquality-rules-code.txt
Researchcoordination-review.txtquality-rules-review.txt
Boilerplate templates live in
<skill-folder>/templates/
. Lead only writes the unique parts (agent .md selection + TASK ASSIGNMENT). Templates are
cat
-ed into the prompt file verbatim.
提示按缓存感知顺序组装:稳定的共享内容在前(跨调用缓存),易变的每个实例内容在后。组装顺序:
你是一个独立工作的单个Agent。自己完成所有工作——不要生成子Agent,不要委托给其他Agent,不要运行Agentic工作流。本次会话中不允许Agentic工作流。

在声称某物缺失或损坏之前——先grep查找现有的保护、处理程序或实现。

{cat <skill-folder>/templates/coordination-review.txt 或 coordination-code.txt —— 替换 {NAME}}

{cat <skill-folder>/templates/severity-guide.txt —— 仅REVIEW/审计任务}

{cat <skill-folder>/templates/quality-rules-review.txt 或 quality-rules-code.txt}

{完整的 <skill-folder>/agents/{agent}.md —— 请参阅Rules → Prompts}

你是名为{NAME}的AI Agent。

--- 任务分配 ---

项目: {工作目录和项目描述}

环境(仅代码任务):
{运行时、测试命令、lint命令}

先前上下文(阶段2+或迭代2+):
{tmp/stage-N-synthesis.md 的内容 或 迭代的累积 tmp/stage-N-iter-*-synthesis.md 内容}

你的任务: {KEY FILES、上下文、范围、MUST ANSWER问题}

可写文件: {仅代码Agent——列出Agent可编辑的源文件。审查/研究/审计Agent:省略此部分}
任务类型协调严重程度指南质量规则
审查/审计coordination-review.txtseverity-guide.txtquality-rules-review.txt
代码/重构coordination-code.txtquality-rules-code.txt
研究coordination-review.txtquality-rules-review.txt
样板模板位于
<skill-folder>/templates/
。主导者仅编写独特部分(Agent.md选择 + 任务分配)。模板被
cat
到提示文件中,原文不变。

Checkpoints & Recovery

检查点与恢复

Save after every step — no exceptions. One active checkpoint (delete previous first). Under 500 chars.
bash
./<skill-folder>/tools/memory.sh session add context "CHECKPOINT: [task] | DONE: [steps] | NEXT: [remaining] | SKIP: [do not redo — completed agents, failed approaches, skipped stages, pending approvals] | FILES: [key files] | BUILD/TEST: [commands]"
The
SKIP:
field prevents rework after compaction/crash recovery. Record:
  • Already-completed agents whose reports exist (e.g.
    s2-reviewer done
    )
  • Failed approaches tried 3× (do not retry same thing)
  • Stages explicitly skipped with reason (e.g.
    verify skipped — 0 findings
    )
  • Pending approval decisions (
    awaiting user approval for push
    )
Compaction recovery — MANDATORY sequence (do ALL steps, no skipping):
  1. Run
    <skill-folder>/tools/glm-recover.sh
    — prints memory session, plan, continuation (if any), newest synthesis (iter or stage, by mtime), and latest checklist in one stream. Replaces steps 1, 2, 3 below with a single command
  2. Re-read AGENTS.md in full and STRICTLY follow its instructions — ALWAYS, no exceptions, no partial reads.
    glm-recover.sh
    does NOT do this for you
  3. Only then resume work
If
glm-recover.sh
is unavailable, fall back to the manual sequence:
  1. ./<skill-folder>/tools/memory.sh session show
    — restore session state
  2. Read
    tmp/glm-plan.md
    — restore current plan
  3. Read the latest
    tmp/sN-synth-report.md
    ,
    tmp/stage-N-iter-K-synthesis.md
    , or
    tmp/stage-N-synthesis.md
    — restore verification/iteration/stage state
Do not rely on continuation summary alone. Do not skip the AGENTS.md re-read — this is the #1 cause of workflow deviation after compaction.
CheckpointRecovery
Plan doneRead
tmp/glm-plan.md
→ prepare agents
Agents preparedList prompts → spawn
Agents spawnedCheck PIDs/reports → verify or re-wait
Verifying stage NRead
tmp/stage-N-synthesis.md
— the lead's synthesis from the synthesis agent's grid
Iterating stage N, iter KRead
tmp/stage-N-iter-K-synthesis.md
+ cumulative context → prepare next iteration
Stage N doneRead synthesis + plan → next stage
Compaction handoff format — for long-running stages, include this block in stage synthesis to preserve active process state:
markdown
undefined
每步后保存——无例外。 一个活动检查点(先删除前一个)。少于500字符。
bash
./<skill-folder>/tools/memory.sh session add context "CHECKPOINT: [任务] | 已完成: [步骤] | 下一步: [剩余步骤] | 跳过: [不要重做——已完成的Agent、失败的方法、跳过的阶段、待批准事项] | 文件: [关键文件] | 构建/测试: [命令]"
SKIP:
字段防止压缩/崩溃恢复后重做工作。记录:
  • 报告已存在的已完成Agent(例如
    s2-reviewer done
  • 尝试3次失败的方法(不要重试相同方法)
  • 明确跳过的阶段及理由(例如
    verify skipped — 0 findings
  • 待批准的决策(
    awaiting user approval for push
压缩恢复——强制序列(完成所有步骤,不要跳过):
  1. 运行
    <skill-folder>/tools/glm-recover.sh
    ——一次性打印内存会话、计划、续接内容(如果有)、最新合成结果(迭代或阶段,按修改时间)和最新清单。用单个命令替换以下步骤1、2、3
  2. 完整重新阅读AGENTS.md并严格遵循其说明——始终如此,无例外,无部分阅读。
    glm-recover.sh
    不会为你完成此操作
  3. 然后恢复工作
如果
glm-recover.sh
不可用,回退到手动序列:
  1. ./<skill-folder>/tools/memory.sh session show
    ——恢复会话状态
  2. 读取
    tmp/glm-plan.md
    ——恢复当前计划
  3. 读取最新的
    tmp/sN-synth-report.md
    tmp/stage-N-iter-K-synthesis.md
    tmp/stage-N-synthesis.md
    ——恢复验证/迭代/阶段状态
不要仅依赖续接摘要。不要跳过重新阅读AGENTS.md——这是压缩后工作流偏离的首要原因。
检查点恢复
计划完成读取
tmp/glm-plan.md
→ 准备Agent
Agent准备完成列出提示 → 生成Agent
Agent已生成检查PID/报告 → 验证或重新等待
验证阶段N读取
tmp/stage-N-synthesis.md
——主导者从合成Agent网格生成的合成结果
迭代阶段N,迭代K读取
tmp/stage-N-iter-K-synthesis.md
+ 累积上下文 → 准备下一迭代
阶段N完成读取合成结果 + 计划 → 下一阶段
压缩交接格式—— 对于长时间运行的阶段,在阶段合成结果中包含此块以保留活动进程状态:
markdown
undefined

Compaction Handoff

压缩交接

  • Current objective: [what this stage is doing]
  • User constraints: [explicit instructions that must survive compaction]
  • Active plan / workflow: [reference to plan artifact or current step]
  • Approval state: [what's approved, what's pending, what was denied]
  • Key facts and decisions: [exact values, resolved ambiguities, why choices were made]
  • Actions already taken: [agents spawned, commands run, files changed]
  • Errors, blockers, attempted fixes: [what failed and what was tried — do not retry same approach]
  • Pending tasks: [remaining subtasks in this stage]
  • Next recommended step: [single concrete action to resume with]
  • Do not redo: [completed agents, failed approaches, skipped steps]
undefined
  • 当前目标: [此阶段正在执行的操作]
  • 用户约束: 必须在压缩后保留的明确指令
  • 活动计划/工作流: [计划工件或当前步骤的引用]
  • 批准状态: [已批准、待批准、已拒绝的内容]
  • 关键事实和决策: [确切值、已解决的模糊性、选择的理由]
  • 已采取的行动: [已生成的Agent、已运行的命令、已更改的文件]
  • 错误、阻塞点、尝试的修复: [失败的内容和尝试的方法——不要重试相同方法]
  • 待处理任务: [此阶段的剩余子任务]
  • 建议下一步: [恢复工作的单个具体操作]
  • 不要重做: [已完成的Agent、失败的方法、跳过的步骤]
undefined

Session Continuation

会话续接

For tasks exceeding a single session:
  1. Complete current stage fully
  2. Write
    tmp/glm-continuation.md
    : original task, plan, completed stages, next stage, decisions, modified files, blockers
  3. ./<skill-folder>/tools/memory.sh add context "GLM-CONTINUATION: [summary]" --tags glm-opencode,continuation
  4. Tell user what's done and what continues
Pickup:
./<skill-folder>/tools/memory.sh search "GLM-CONTINUATION"
→ read continuation file → read prior synthesis → continue next stage. On final stage, clean up continuation file and memory entry. Never re-do verified prior work.
对于超过单个会话的任务:
  1. 完整完成当前阶段
  2. 编写
    tmp/glm-continuation.md
    :原始任务、计划、已完成的阶段、下一阶段、决策、修改的文件、阻塞点
  3. ./<skill-folder>/tools/memory.sh add context "GLM-CONTINUATION: [摘要]" --tags glm-opencode,continuation
  4. 告知用户已完成的工作和将继续的内容
恢复:
./<skill-folder>/tools/memory.sh search "GLM-CONTINUATION"
→ 读取续接文件 → 读取先前的合成结果 → 继续下一阶段。最终阶段后,清理续接文件和内存条目。永远不要重做已验证的先前工作。

Error Handling

错误处理

ScenarioAction
No report after exitRead log to diagnose failure. Fix root cause (bad prompt? missing dependency? environment?). Re-spawn the agent. Do NOT fill gaps yourself — filling gaps is agent work.
STALLED (flagged by wait-glm.sh)Kill process, read log to diagnose. Fix root cause. Re-spawn. Do NOT note gap and proceed.
Agent claims success but output wrongDiagnose why output is wrong (bad prompt? misunderstood task?). Fix the prompt/task. Re-spawn the agent. Do NOT verify or fix the output yourself.
Incorrect editsDiagnose why the agent produced wrong output (bad prompt? misunderstood task?). Fix the prompt/task. Spawn a quick-fix agent to revert and rewrite. Do NOT revert changes yourself. If the quick-fix agent is still wrong, escalate to full IMPLEMENT → REVIEW → VERIFY.
2+ agents fail same env errorSTOP respawning. Diagnose environment first (do NOT fix environment issues directly — spawn an agent if changes needed)
Agent aborted (same error 3×)Read log to diagnose root cause, fix environment/config (spawn an agent if code/config changes needed), then respawn
Stage partially failed (1+ agents produced no useful output or wrong output)Diagnose root causes across all failed agents. Fix issues (environment, prompts, tasks). Re-spawn ALL failed agents. The stage is incomplete until all agents succeed. Do NOT proceed to the next stage with gaps.
Iteration cap hit without convergenceSynthesize all iterations, note "convergence not reached" in delivery, proceed
Adversarial verification produces suspicious results (CONFIRMED on obviously-wrong findings or REJECTED with weak evidence)Diagnose prompt/task quality — adversarial agent may have misunderstood its role. Adjust MUST ANSWER questions or adversarial instructions and respawn.
场景行动
退出后无报告读取日志诊断失败原因。修复根本原因(提示错误?依赖项缺失?环境问题?)。重新生成Agent。不要自己填补空白——填补空白是Agent的工作。
STALLED(由wait-glm.sh标记)终止进程,读取日志诊断。修复根本原因。重新生成Agent。不要记录间隙并继续。
Agent声称成功但输出错误诊断输出错误的原因(提示错误?误解任务?)。修复提示/任务。重新生成Agent。不要自己验证或修复输出。
不正确的编辑诊断Agent产生错误输出的原因(提示错误?误解任务?)。修复提示/任务。生成快速修复Agent回滚并重写。不要自己回滚更改。如果快速修复Agent仍然错误,升级为完整IMPLEMENT → REVIEW → VERIFY。
2+个Agent因相同环境错误失败停止重新生成。先诊断环境(不要直接修复环境问题——如果需要更改,生成Agent)
Agent中止(相同错误3次)读取日志诊断根本原因,修复环境/配置(如果需要代码/配置更改,生成Agent),然后重新生成
阶段部分失败(1+个Agent未产生有用输出或输出错误)诊断所有失败Agent的根本原因。修复问题(环境、提示、任务)。重新生成所有失败Agent。所有Agent成功前阶段不完整。不要在有间隙的情况下进入下一阶段。
达到迭代上限仍未收敛合成所有迭代,在交付中注明“未达到收敛”,继续
对抗性验证产生可疑结果(明显错误的发现结果被CONFIRMED或弱证据被REJECTED)诊断提示/任务质量——对抗性Agent可能误解了其角色。调整MUST ANSWER问题或对抗性指令并重新生成。

Rules

规则

Quality over speed — ALWAYS. Never rush, never cut corners, never try to finish faster. Slow, thorough, methodical work produces quality. Speed produces bugs. Prefer more stages, more agents, more verification over shorter timelines. There is no deadline. The only measure of success is production-ready, bug-free code.
Limits: Per-batch limit and agent parallelism rules are defined in Tools and Agent Spawning — don't restate. Need more coverage than the 10-agent per-batch cap allows? Add stages, not more agents per batch. Agents run until done (no turn limit). One task per agent. Respawn naming:
-r2
,
-r3
. No two agents edit same file within a stage (read overlap OK). Balance workload — each agent should cover roughly equal scope.
Task tool prohibition (MANDATORY — single most important rule): Agent delegation in this project happens ONLY via
spawn-glm.sh
. The
Task
tool with its
subagent_type
parameter is FORBIDDEN — never call it, regardless of the use case (exploration, code review, implementation, research, anything).
The Task tool's built-in
subagent_type
list happens to share names with our agent
.md
files in
<skill-folder>/agents/
(
code-reviewer
,
ios-pro
,
swift-pro
, etc.) — these are TWO DIFFERENT THINGS. The Task tool ships a separate sub-agent runtime that bypasses our agent delegation system, the
spawn-glm.sh
pipeline, verification, report formats, and quality rules. Our agent
.md
files are reached ONLY by passing
-a AGENT_NAME
to
assemble-prompt.sh
and then spawning via
spawn-glm.sh
.
If you catch yourself about to call
Task(subagent_type=...)
— stop, use
spawn-glm.sh
instead.
Agent count per stage (MANDATORY — fill capacity by task decomposition): Decompose the task into as many independent subtasks as it naturally splits into, spawn one agent per subtask, maximum 10 agents per batch. Default to what the task genuinely requires — scale to scope. Over-engineering with unnecessary agents adds coordination overhead that degrades quality (proven across 260+ configurations). Fill the maximum only when the task truly spans that many distinct domains. Verification stages scale with findings count and impact surface, not discovery agent count — minimum 1 extraction agent for every stage; adversarial agents run only if extraction finds at least one finding to falsify. When in doubt, decompose into more parallel agents — broader coverage finds more issues. Never run sequential single-agent stages when those stages could be a single stage with parallel agents (see Workflow → Planning → Stage decomposition rule).
Prompts: Include the FULL agent
.md
file — agents are optimized and every section earns its place. Do NOT trim or skip sections. Boilerplate (quality rules, severity guide, coordination, report format) comes from
<skill-folder>/templates/
and is prepended before the agent .md for prompt-cache stability (stable shared content cached first, volatile content last). Agents don't load AGENTS.md — all context must be in prompt.
Verification: Every finding labeled. Every label backed by Read. 100% complete before proceeding. ALL verified actionable findings fixed via fix-agent — the lead does not fix findings directly.
Lead code prohibition (MANDATORY): The lead never writes, edits, or modifies project source code. Every code change — implementation, bug fixes, config adjustments, script changes, one-liners — goes through a spawned agent. The lead's tools (Edit, Write) are for tmp/ artifacts only: task files, prompts, synthesis reports. The only exception is editing AGENTS.md itself (meta-configuration).
Platform:
opencode
or
pi
on all platforms (spawn-glm.sh handles invocation, use
--pi
flag if running in pi). Always redirect output to log files.


质量优先——始终如此。 不要匆忙,不要偷工减料,不要试图更快完成。缓慢、彻底、有条理的工作产生质量。速度产生漏洞。优先选择更多阶段、更多Agent、更多验证,而非更短的时间线。没有截止日期。成功的唯一衡量标准是生产就绪、无漏洞的代码。
限制: 每批次限制和Agent并行规则在Tools和Agent Spawning中定义——不再重述。需要超过每批次10个Agent上限的覆盖范围?添加阶段,而非每批次更多Agent。Agent运行直到完成(无回合限制)。每个Agent一个任务。重新生成命名:
-r2
,
-r3
。同一阶段内没有两个Agent编辑相同文件(读取重叠可以)。平衡工作负载——每个Agent应覆盖大致相等的范围。
任务工具禁止(强制执行——最重要的单一规则): 本项目中的Agent委托仅通过
spawn-glm.sh
进行。禁止使用带有
subagent_type
参数的
Task
工具——无论用例如何(探索、代码审查、实现、研究等),永远不要调用它。
Task工具内置的
subagent_type
列表恰好与我们
<skill-folder>/agents/
中的Agent.md文件同名(
code-reviewer
,
ios-pro
,
swift-pro
等)——这是两个不同的东西。Task工具提供单独的子Agent运行时,绕过我们的Agent委托系统、
spawn-glm.sh
管道、验证、报告格式和质量规则。我们的Agent.md文件仅通过向
assemble-prompt.sh
传递
-a AGENT_NAME
并通过
spawn-glm.sh
生成才能访问。
如果你发现自己要调用
Task(subagent_type=...)
——停止,改用
spawn-glm.sh
每个阶段的Agent数量(强制执行——通过任务分解填满容量): 将任务分解为尽可能多的独立子任务,每个子任务生成一个Agent,每批次最多10个Agent。默认生成任务真正需要的Agent——根据范围调整。不必要的Agent过度设计会增加协调开销,降低质量(在260+配置中已验证)。仅当任务真正跨越这么多不同领域时才填满上限。验证阶段根据发现结果数量和影响范围调整,而非发现Agent数量——每个阶段至少1个提取Agent;仅当提取发现至少一个要证伪的发现结果时才运行对抗性Agent。如有疑问,分解为更多并行Agent——更广泛的覆盖范围发现更多问题。永远不要在可以用单个并行Agent阶段完成时运行顺序单Agent阶段(请参阅Workflow → Planning → Stage decomposition rule)。
提示: 包含完整的Agent.md文件——Agent经过优化,每个部分都有其价值。不要修剪或跳过部分。样板内容(质量规则、严重程度指南、协调、报告格式)来自
<skill-folder>/templates/
,并在Agent.md之前添加,以实现提示缓存稳定性(稳定的共享内容先缓存,易变内容在后)。Agent不加载AGENTS.md——所有上下文必须在提示中。
验证: 每个发现结果都有标签。每个标签都有读取支持。继续前100%完成。所有已验证的可操作发现结果通过修复Agent修复——主导者不得直接修复发现结果。
主导者代码禁止(强制执行): 主导者永远不会编写、编辑或修改项目源代码。每次代码更改——实现、漏洞修复、配置调整、脚本更改、单行代码——都通过生成的Agent进行。主导者的工具(Edit、Write)仅用于tmp/工件:任务文件、提示、合成报告。唯一例外是编辑AGENTS.md本身(元配置)。
平台: 所有平台使用
opencode
pi
(spawn-glm.sh处理调用,如果在pi中运行,使用
--pi
标志)。始终将输出重定向到日志文件。


Skills (Workflows)

Skills(工作流)

Workflows are available as skills in
<skill-folder>/skills/
directory. Use
/skill-name
to invoke. Skills are orthogonal to the agentic workflow — they are utility operations invoked directly by the lead as needed. Skill output is not routed through the verification pipeline.
工作流作为技能在
<skill-folder>/skills/
目录中可用。使用
/skill-name
调用。技能与Agentic工作流正交——它们是主导者根据需要直接调用的实用操作。技能输出不通过验证管道路由。