solo-retro

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/retro

/retro

This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.
Post-pipeline retrospective. Parses Big Head pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.
该Skill是独立的——请遵循以下步骤执行,无需委托给其他Skill(/review、/audit、/build)或生成Task子Agent。所有分析均需直接运行。
流水线后回顾分析。解析Big Head流水线日志,统计有效迭代与无效迭代的数量,识别重复出现的失败模式,为流水线运行评分,并提出具体的Skill/脚本补丁建议,以避免下次出现相同故障。

When to use

适用场景

After a Big Head pipeline completes (or gets cancelled). This is the process quality check —
/review
checks code quality,
/retro
checks pipeline process quality.
Can also be used standalone on any project that has pipeline logs.
Big Head流水线完成(或被取消)后使用。这是流程质量检查——
/review
用于检查代码质量
/retro
用于检查流水线流程质量
也可独立用于任何包含流水线日志的项目。

MCP Tools (use if available)

MCP工具(如有可用请使用)

  • session_search(query)
    — find past pipeline runs and known issues
  • codegraph_explain(project)
    — understand project architecture context
  • codegraph_query(query)
    — query code graph for project metadata
If MCP tools are not available, fall back to Glob + Grep + Read.
  • session_search(query)
    — 查找过往流水线运行记录和已知问题
  • codegraph_explain(project)
    — 理解项目架构上下文
  • codegraph_query(query)
    — 查询代码图谱获取项目元数据
如果MCP工具不可用,则退而使用Glob + Grep + Read。

Phase 1: Locate Artifacts

阶段1:定位工件

  1. Detect project from
    $ARGUMENTS
    or CWD:
    • If argument provided: use it as project name
    • Otherwise: extract from CWD basename (e.g.,
      ~/startups/active/life2film
      life2film
      )
  2. Find pipeline state file:
    ~/.solo/pipelines/solo-pipeline-{project}.local.md
    • If it exists: pipeline is still running or wasn't cleaned up — read YAML frontmatter for
      project_root:
    • If not: pipeline completed — use
      ~/startups/active/{project}
      as project root
  3. Verify artifacts exist (parallel reads):
    • Pipeline log:
      {project_root}/.solo/pipelines/pipeline.log
      (REQUIRED — abort if missing)
    • Iter logs:
      {project_root}/.solo/pipelines/iter-*.log
    • Progress file:
      {project_root}/.solo/pipelines/progress.md
    • Plan-done directory:
      {project_root}/docs/plan-done/
    • Active plan:
      {project_root}/docs/plan/
  4. Count iter logs:
    ls {project_root}/.solo/pipelines/iter-*.log | wc -l
    • Report: "Found {N} iteration logs"
  1. $ARGUMENTS
    或当前工作目录(CWD)检测项目
    • 若提供了参数:将其作为项目名称
    • 否则:从CWD的基名中提取(例如:
      ~/startups/active/life2film
      life2film
  2. 查找流水线状态文件
    ~/.solo/pipelines/solo-pipeline-{project}.local.md
    • 若文件存在:流水线仍在运行或未被清理——读取YAML前置元数据中的
      project_root:
      字段
    • 若文件不存在:流水线已完成——使用
      ~/startups/active/{project}
      作为项目根目录
  3. 验证工件是否存在(并行读取)
    • 流水线日志:
      {project_root}/.solo/pipelines/pipeline.log
      (必填——若缺失则终止操作)
    • 迭代日志:
      {project_root}/.solo/pipelines/iter-*.log
    • 进度文件:
      {project_root}/.solo/pipelines/progress.md
    • 已完成计划目录:
      {project_root}/docs/plan-done/
    • 活跃计划:
      {project_root}/docs/plan/
  4. 统计迭代日志数量
    ls {project_root}/.solo/pipelines/iter-*.log | wc -l
    • 报告:“找到 {N} 份迭代日志”

Phase 2: Parse Pipeline Log (quantitative)

阶段2:解析流水线日志(定量分析)

Read
pipeline.log
in full. Parse line-by-line, extracting structured data from log tags:
Log format:
[HH:MM:SS] TAG | message
Extract by tag:
TagWhat to extract
START
Pipeline run boundary — count restarts (multiple START lines = restarts)
STAGE
iter N/M | stage S/T: {stage_id}
— iteration count per stage
SIGNAL
<solo:done/>
or
<solo:redo/>
— which stages got completion signals
INVOKE
Skill invoked — extract skill name, check for wrong names
ITER
commit: {sha} | result: {stage complete|continuing}
— per-iteration outcome
CHECK
{stage} | {path} -> FOUND|NOT FOUND
— marker file checks
FINISH
Duration: {N}m
— total duration per run
MAXITER
Reached max iterations ({N})
— hit iteration ceiling
QUEUE
Plan cycling events (activating, archiving)
CIRCUIT
Circuit breaker triggered (if present)
CWD
Working directory changes
CTRL
Control signals (pause/stop/skip)
Compute metrics:
total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"

per_stage = {
  stage_id: {
    attempts: count of STAGE lines for this stage,
    successes: count of ITER lines with "stage complete" for this stage,
    waste_ratio: (attempts - successes) / attempts * 100,
  }
}
完整读取
pipeline.log
文件。逐行解析,从日志标签中提取结构化数据:
日志格式
[HH:MM:SS] TAG | 消息内容
按标签提取信息
标签提取内容
START
流水线运行边界——统计重启次数(多个START行表示多次重启)
STAGE
`iter N/M
SIGNAL
<solo:done/>
<solo:redo/>
— 哪些阶段收到了完成信号
INVOKE
调用的Skill——提取Skill名称,检查是否存在名称错误
ITER
`commit: {sha}
CHECK
`{stage}
FINISH
Duration: {N}m
— 每次运行的总时长
MAXITER
Reached max iterations ({N})
— 是否达到迭代上限
QUEUE
计划循环事件(激活、归档)
CIRCUIT
断路器触发记录(若存在)
CWD
工作目录变更记录
CTRL
控制信号(暂停/停止/跳过)
计算指标
total_runs = START行的数量
total_iterations = ITER行的数量
productive_iters = 包含"stage complete"的ITER行数量
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = MAXITER行的数量
plan_cycles = 包含"Cycling"的QUEUE行数量

per_stage = {
  stage_id: {
    attempts: 该阶段的STAGE行数量,
    successes: 该阶段中包含"stage complete"的ITER行数量,
    waste_ratio: (attempts - successes) / attempts * 100,
  }
}

Phase 3: Parse Progress.md (qualitative)

阶段3:解析Progress.md(定性分析)

Read
progress.md
and scan for error patterns:
  1. Unknown skill errors: grep for
    Unknown skill:
    — extract which skill name was wrong
  2. Empty iterations: iterations where "Last 5 lines" show only errors or session header (no actual work done)
  3. Repeated errors: same error appearing in consecutive iterations → spin-loop indicator
  4. Doubled signals:
    <solo:done/><solo:done/>
    in same iteration → minor noise (note but don't penalize)
  5. Redo loops: count how many times build→review→redo→build cycles occurred
For each error pattern found, record:
  • Pattern name
  • First occurrence (iteration number)
  • Total occurrences
  • Consecutive streak (max)
读取
progress.md
并扫描错误模式:
  1. 未知Skill错误:搜索
    Unknown skill:
    ——提取错误的Skill名称
  2. 空迭代:迭代中“最后5行”仅显示错误或会话头(未执行实际工作)
  3. 重复错误:连续迭代中出现相同错误→表示存在循环自旋
  4. 重复信号:同一迭代中出现
    <solo:done/><solo:done/>
    →属于轻微噪音(仅记录,不扣分)
  5. 重做循环:统计
    构建→评审→重做→构建
    循环发生的次数
对于每个发现的错误模式,记录:
  • 模式名称
  • 首次出现位置(迭代编号)
  • 总出现次数
  • 连续出现的最大次数

Phase 4: Analyze Iter Logs (sample-based)

阶段4:解析迭代日志(抽样分析)

Do NOT read all iter logs — could be 60+. Use smart sampling:
  1. First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it
    • Strip ANSI codes when reading:
      sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
  2. First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log
    • Look for
      <solo:done/>
      in the output
  3. Final review iter: Read the last
    iter-*-review.log
    (the verdict)
  4. Extract from each sampled log:
    • Tools called (count of tool_use blocks)
    • Errors encountered (grep for
      Error
      ,
      error
      ,
      Unknown
      ,
      failed
      )
    • Signal output (
      <solo:done/>
      or
      <solo:redo/>
      present?)
    • First 5 and last 10 meaningful lines (skip blank lines)
请勿读取所有迭代日志——数量可能超过60份。请使用智能抽样:
  1. 每种模式的首次失败迭代:针对阶段3中发现的每个失败模式,读取首次出现该模式的迭代日志
    • 读取时去除ANSI代码:
      sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
  2. 每个阶段的首次成功迭代:针对最终成功的每个阶段,读取首次成功的迭代日志
    • 查找输出中的
      <solo:done/>
  3. 最终评审迭代:读取最后一份
    iter-*-review.log
    (最终评审结果)
  4. 从每份抽样日志中提取以下信息
    • 调用的工具(tool_use块的数量)
    • 遇到的错误(搜索
      Error
      error
      Unknown
      failed
    • 信号输出(是否存在
      <solo:done/>
      <solo:redo/>
    • 前5行和后10行有意义的内容(跳过空行)

Phase 5: Plan Fidelity Check

阶段5:计划一致性检查

For each track directory in
docs/plan-done/
and
docs/plan/
:
  1. Read spec.md (if exists):
    • Count acceptance criteria: total
      - [ ]
      and
      - [x]
      checkboxes
    • Calculate:
      criteria_met = checked / total * 100
  2. Read plan.md (if exists):
    • Count tasks: total
      - [ ]
      and
      - [x]
      checkboxes
    • Count phases (## headers)
    • Check for SHA annotations (
      <!-- sha:... -->
      )
    • Calculate:
      tasks_done = checked / total * 100
  3. Compile per-track summary:
    • Track ID, criteria met %, tasks done %, has SHAs
针对
docs/plan-done/
docs/plan/
中的每个跟踪目录:
  1. 读取spec.md(若存在):
    • 统计验收标准:所有
      - [ ]
      - [x]
      复选框的总数
    • 计算:
      criteria_met = 已勾选数量 / 总数量 * 100
  2. 读取plan.md(若存在):
    • 统计任务:所有
      - [ ]
      - [x]
      复选框的总数
    • 统计阶段数量(## 标题的数量)
    • 检查是否存在SHA注释(
      <!-- sha:... -->
    • 计算:
      tasks_done = 已勾选数量 / 总数量 * 100
  3. 编译每个跟踪目录的摘要
    • 跟踪ID、验收标准达标率%、任务完成率%、是否包含SHA

Phase 6: Git & Code Quality (lightweight)

阶段6:Git与代码质量(轻量级检查)

Quick checks only — NOT a full /review:
  1. Commit count and format:
    bash
    git -C {project_root} log --oneline | wc -l
    git -C {project_root} log --oneline | head -30
    • Count commits with conventional format (
      feat:
      ,
      fix:
      ,
      chore:
      ,
      test:
      ,
      docs:
      ,
      refactor:
      ,
      build:
      ,
      ci:
      ,
      perf:
      )
    • Calculate:
      conventional_pct = conventional / total * 100
  2. Committer breakdown:
    bash
    git -C {project_root} shortlog -sn --no-merges | head -10
  3. Test status (if test command exists in CLAUDE.md or package.json):
    • Run test suite, capture pass/fail count
    • If no test command found, skip and note "no tests configured"
  4. Build status (if build command exists):
    • Run build, capture success/fail
    • If no build command found, skip and note "no build configured"
仅进行快速检查——并非完整的/review:
  1. 提交数量与格式
    bash
    git -C {project_root} log --oneline | wc -l
    git -C {project_root} log --oneline | head -30
    • 统计符合规范格式的提交(
      feat:
      fix:
      chore:
      test:
      docs:
      refactor:
      build:
      ci:
      perf:
    • 计算:
      conventional_pct = 规范格式提交数 / 总提交数 * 100
  2. 提交者分布
    bash
    git -C {project_root} shortlog -sn --no-merges | head -10
  3. 测试状态(若CLAUDE.md或package.json中存在测试命令):
    • 运行测试套件,记录通过/失败数量
    • 若未找到测试命令,则跳过并记录“未配置测试”
  4. 构建状态(若存在构建命令):
    • 运行构建,记录成功/失败状态
    • 若未找到构建命令,则跳过并记录“未配置构建”

Phase 7: Score & Report

阶段7:评分与报告

Load scoring rubric from
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md
. If plugin root not available, use the embedded weights:
Scoring weights:
  • Efficiency (waste %): 25%
  • Stability (restarts): 20%
  • Fidelity (criteria met): 20%
  • Quality (test pass rate): 15%
  • Commits (conventional %): 5%
  • Docs (plan staleness): 5%
  • Signals (clean signals): 5%
  • Speed (total duration): 5%
Generate report at
{project_root}/docs/retro/{date}-retro.md
:
markdown
undefined
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md
加载评分规则。若插件根目录不可用,则使用内置权重:
评分权重
  • 效率(浪费率%):25%
  • 稳定性(重启次数):20%
  • 一致性(验收标准达标率):20%
  • 质量(测试通过率):15%
  • 提交(规范格式占比%):5%
  • 文档(计划陈旧度):5%
  • 信号(信号整洁度):5%
  • 速度(总时长):5%
生成报告,保存至
{project_root}/docs/retro/{date}-retro.md
markdown
undefined

Pipeline Retro: {project} ({date})

流水线回顾:{project}({date})

Overall Score: {N}/10

总体评分:{N}/10

Pipeline Efficiency

流水线效率

MetricValueRating
Total iterations{N}
Productive iterations{N} ({pct}%){emoji}
Wasted iterations{N} ({pct}%){emoji}
Pipeline restarts{N}{emoji}
Max-iter hits{N}{emoji}
Total duration{time}{emoji}
指标数值评级
总迭代次数{N}
有效迭代次数{N}({pct}%){emoji}
无效迭代次数{N}({pct}%){emoji}
流水线重启次数{N}{emoji}
达到迭代上限次数{N}{emoji}
总时长{time}{emoji}

Per-Stage Breakdown

各阶段细分

StageAttemptsSuccessesWaste %Notes
scaffold
setup
plan
build
deploy
review
阶段尝试次数成功次数浪费率%备注
scaffold
setup
plan
build
deploy
review

Failure Patterns

失败模式

Pattern 1: {name}

模式1:{name}

  • Occurrences: {N} iterations
  • Root cause: {analysis}
  • Wasted: {N} iterations
  • Fix: {concrete suggestion with file reference}
  • 出现次数:{N} 次迭代
  • 根本原因:{分析内容}
  • 浪费的迭代次数:{N}
  • 修复建议:{带文件引用的具体建议}

Pattern 2: ...

模式2:...

Plan Fidelity

计划一致性

TrackCriteria MetTasks DoneSHAsRating
{track-id}{N}%{N}%{yes/no}{emoji}
跟踪目录验收标准达标率任务完成率是否包含SHA评级
{track-id}{N}%{N}%{是/否}{emoji}

Code Quality (Quick)

代码质量(快速检查)

  • Tests: {N} pass, {N} fail (or "not configured")
  • Build: PASS / FAIL (or "not configured")
  • Commits: {N} total, {pct}% conventional format
  • 测试:{N} 项通过,{N} 项失败(或“未配置”)
  • 构建:通过 / 失败(或“未配置”)
  • 提交:共 {N} 次,{pct}% 符合规范格式

Three-Axis Growth

三维成长评估

AxisScoreEvidence
Technical (code, tools, architecture){0-10}{what changed}
Cognitive (understanding, strategy, decisions){0-10}{what improved}
Process (harness, skills, pipeline, docs){0-10}{what evolved}
If only one axis is served — note what's missing.
维度评分依据
技术维度(代码、工具、架构){0-10}{变更内容}
认知维度(理解、策略、决策){0-10}{改进内容}
流程维度(Harness、Skill、流水线、文档){0-10}{演进内容}
若仅覆盖了一个维度——需记录缺失的维度。

Recommendations

建议

  1. [CRITICAL] {patch suggestion with file:line reference}
  2. [HIGH] {improvement}
  3. [MEDIUM] {optimization}
  4. [LOW] {nice-to-have}
  1. [CRITICAL] {带文件:行号引用的补丁建议}
  2. [HIGH] {改进建议}
  3. [MEDIUM] {优化建议}
  4. [LOW] {锦上添花的建议}

Suggested Patches

建议补丁

Patch 1: {file} — {description}

补丁1:{file} — {描述}

What: {one-line description} Why: {root cause reference from Failure Patterns}
```diff
  • old line
  • new line ```

**Rating guide (use these emojis):**
- GREEN = excellent
- YELLOW = acceptable
- RED = needs attention
内容:{一行描述} 原因:{引用失败模式中的根本原因}
diff
- 旧代码行
+ 新代码行

**评级指南(使用以下emoji)**:
- 绿色 = 优秀
- 黄色 = 可接受
- 红色 = 需要关注

Phase 8: Interactive Patching

阶段8:交互式补丁

After generating the report:
  1. Show summary to user: overall score, top 3 failure patterns, top 3 recommendations
  2. For each suggested patch (if any), use
    AskUserQuestion
    :
    • Question: "Apply patch to {file}? {one-line description}"
    • Options: "Apply" / "Skip" / "Show diff first"
  3. If "Show diff first": display the full diff, then ask again (Apply / Skip)
  4. If "Apply": use Edit tool to apply the change directly
  5. After all patches processed:
    • If any patches were applied: suggest committing with
      fix(retro): {description}
    • Do NOT auto-commit — just suggest the command
生成报告后:
  1. 向用户展示摘要:总体评分、前3个失败模式、前3个建议
  2. 针对每个建议补丁(若有),使用
    AskUserQuestion
    • 问题:“是否将补丁应用到 {file}?{一行描述}”
    • 选项:“应用” / “跳过” / “先显示差异”
  3. 若选择“先显示差异”:显示完整差异内容,然后再次询问(应用 / 跳过)
  4. 若选择“应用”:使用编辑工具直接应用变更
  5. 处理完所有补丁后
    • 若应用了任何补丁:建议使用
      fix(retro): {描述}
      提交变更
    • 请勿自动提交——仅建议命令

Phase 9: CLAUDE.md Revision

阶段9:修订CLAUDE.md

After patching, revise the project's CLAUDE.md to keep it lean and useful for future agents.
补丁应用完成后,修订项目的CLAUDE.md,使其保持精简,便于未来Agent使用。

Steps:

步骤:

  1. Read CLAUDE.md and check size:
    wc -c CLAUDE.md
  2. Add learnings from this retro:
    • Pipeline failure patterns worth remembering (avoid next time)
    • New workflow rules or process improvements
    • Updated commands or tooling changes
    • Architecture decisions that emerged during the pipeline run
  3. If over 40,000 characters — trim ruthlessly:
    • Collapse completed phase/milestone histories into one line each
    • Remove verbose explanations — keep terse, actionable notes
    • Remove duplicate info (same thing explained in multiple sections)
    • Remove historical migration notes, old debugging context
    • Remove examples that are obvious from code or covered by skill/doc files
    • Remove outdated troubleshooting for resolved issues
  4. Verify result ≤ 40,000 characters — if still over, cut least actionable content
  5. Write updated CLAUDE.md, update "Last updated" date
  1. 读取CLAUDE.md并检查大小
    wc -c CLAUDE.md
  2. 添加本次回顾的经验总结
    • 值得记住的流水线失败模式(避免下次出现)
    • 新的工作流规则或流程改进
    • 更新后的命令或工具变更
    • 流水线运行过程中形成的架构决策
  3. 若文件大小超过40,000字符——严格精简
    • 将已完成的阶段/里程碑历史压缩为每行一个阶段
    • 删除冗长的解释——保留简洁、可操作的说明
    • 删除重复信息(同一内容在多个章节中解释)
    • 删除历史迁移说明、旧调试上下文
    • 删除代码中已明确或Skill/文档文件已覆盖的示例
    • 删除已解决问题的过时故障排除内容
  4. 验证最终大小≤40,000字符——若仍超出,删除最不具操作性的内容
  5. 写入更新后的CLAUDE.md,更新“最后更新”日期

Priority (keep → cut):

优先级(保留→删除):

  1. ALWAYS KEEP: Tech stack, directory structure, Do/Don't rules, common commands, architecture decisions
  2. KEEP: Workflow instructions, troubleshooting for active issues, key file references
  3. CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
  4. CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues
  1. 必须保留:技术栈、目录结构、Do/Don't规则、常用命令、架构决策
  2. 建议保留:工作流说明、当前问题的故障排除、关键文件引用
  3. 可压缩:阶段历史(每行一个)、详细示例、工具/MCP列表
  4. 优先删除:历史记录、冗长解释、重复内容、已解决问题

Rules:

规则:

  • Never remove Do/Don't sections — critical guardrails
  • Preserve overall section structure and ordering
  • Every line must earn its place: "would a future agent need this to do their job?"
  • Commit the update:
    git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"
  • 切勿删除Do/Don't章节——这是关键的防护规则
  • 保留整体章节结构和顺序
  • 每一行内容都必须有存在的价值:“未来Agent完成工作是否需要这部分内容?”
  • 提交更新:
    git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"

Phase 10: Factory Critic

阶段10:工厂评审

After evaluating the project pipeline, step back and evaluate the factory itself — the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.
完成项目流水线评估后,退一步评估工厂本身——即生成该结果的Skill、脚本和流水线逻辑。请做一个严苛的评审者。

What to evaluate:

评估内容:

  1. Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):
    • For each skill:
      ${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md
    • Did the skill have the right instructions for this project's needs?
    • Did it miss context it should have had?
  2. Read solo-dev.sh signal handling and stage logic:
    • ${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh
    • Were there structural issues (wrong stage order, missing re-exec, broken redo)?
  3. Cross-reference with failure patterns from Phase 3:
    • For each failure: was the root cause in the skill, the script, or the project?
    • Skills that caused waste = factory defects
  1. 读取本次流水线运行中调用的Skill(来自流水线日志中的INVOKE行):
    • 每个Skill的路径:
      ${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md
    • 该Skill是否具备符合项目需求的正确指令?
    • 它是否缺失了本应具备的上下文信息?
  2. 读取solo-dev.sh的信号处理和阶段逻辑
    • 路径:
      ${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh
    • 是否存在结构性问题(阶段顺序错误、缺失重新执行、重做功能损坏)?
  3. 与阶段3的失败模式交叉验证
    • 对于每个失败:根本原因是在Skill中、脚本中还是项目本身?
    • 导致浪费的Skill=工厂缺陷

Score the factory (not the project):

为工厂评分(而非项目):

Factory Score: {N}/10

Skill quality:
- {skill}: {score}/10 — {why}
- {skill}: {score}/10 — {why}

Pipeline reliability: {N}/10 — {why}

Missing capabilities:
- {what the factory couldn't do that it should have}

Top factory defects:
1. {defect} → {which file to fix} → {concrete fix}
2. {defect} → {which file to fix} → {concrete fix}
工厂评分:{N}/10

Skill质量:
- {skill}: {score}/10 — {原因}
- {skill}: {score}/10 — {原因}

流水线可靠性:{N}/10 — {原因}

缺失的能力:
- {工厂本应具备但缺失的能力}

主要工厂缺陷:
1. {缺陷描述} → {需修复的文件} → {具体修复方案}
2. {缺陷描述} → {需修复的文件} → {具体修复方案}

Harness Evolution — think about the bigger picture

Harness演进——从更宏观的角度思考

After scoring the factory, step back further and think about the harness — the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:
  1. Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?
    • Missing docs → add to
      docs/
      or CLAUDE.md
    • Stale docs → flag for doc-gardening
    • Knowledge only in your head → encode it
  2. Architectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?
    • Repeated boundary violations → need a linter or structural test
    • Inconsistent patterns → need golden principle in CLAUDE.md
    • Data shape errors → need parse-at-boundary enforcement
  3. Decision traces: What worked well that future agents should reuse? What failed that they should avoid?
    • Good patterns → capture as precedent in KB or CLAUDE.md
    • Bad patterns → encode as anti-pattern or lint rule
    • Think: "if another agent hits this same problem tomorrow, what should it find?"
  4. Skill gaps: Which skills need better instructions? Which new skills should exist?
    • Skill that caused waste → concrete SKILL.md patch
    • Missing capability → new skill idea for evolution.md
Append harness findings to the evolution log alongside factory defects.
完成工厂评分后,再退一步思考Harness——即指导Agent工作的整个系统(CLAUDE.md、docs/、代码检查器、Skill、模板)。请思考:
  1. 上下文工程:Agent是否在仓库中获得了所需的所有信息?还是因为知识缺失/分散/陈旧而遇到困难?
    • 缺失的文档→添加到
      docs/
      或CLAUDE.md
    • 陈旧的文档→标记为需要文档维护
    • 仅存在于你脑海中的知识→将其编码到文档中
  2. 架构约束:Agent是否违反了模块边界、产生了不一致的模式或忽略了约定?
    • 重复的边界违规→需要代码检查器或结构测试
    • 不一致的模式→需要在CLAUDE.md中添加黄金原则
    • 数据格式错误→需要在边界处强制执行解析检查
  3. 决策轨迹:哪些内容运行良好,值得未来Agent复用?哪些内容失败,应避免?
    • 良好模式→捕获为知识库或CLAUDE.md中的先例
    • 不良模式→编码为反模式或代码检查规则
    • 思考:“如果另一个Agent明天遇到同样的问题,它应该能找到什么解决方案?”
  4. Skill缺口:哪些Skill需要更好的指令?哪些新Skill应该存在?
    • 导致浪费的Skill→具体的SKILL.md补丁
    • 缺失的能力→在evolution.md中记录新Skill的想法
将Harness的发现与工厂缺陷一起添加到演进日志中。

Write to evolution log:

写入演进日志:

Append findings to
~/.solo/evolution.md
(create if not exists):
markdown
undefined
将发现内容追加到
~/.solo/evolution.md
(若不存在则创建):
markdown
undefined

{YYYY-MM-DD} | {project} | Factory Score: {N}/10

{YYYY-MM-DD} | {project} | 工厂评分:{N}/10

Pipeline: {stages run} | Iters: {total} | Waste: {pct}%
流水线:{运行的阶段} | 迭代次数:{总次数} | 浪费率:{pct}%

Defects

缺陷

  • {severity} | {skill/script}: {description}
    • Fix: {concrete file:change}
  • {严重程度} | {skill/脚本}:{描述}
    • 修复方案:{具体的文件:变更内容}

Harness Gaps

Harness缺口

  • Context: {what knowledge was missing or stale for the agent}
  • Constraints: {what boundary violations or inconsistencies occurred}
  • Precedents: {patterns worth capturing for future agents — good or bad}
  • 上下文:{Agent缺失或陈旧的知识内容}
  • 约束:{发生的边界违规或不一致问题}
  • 先例:{值得为未来Agent记录的模式——无论好坏}

Missing

缺失的能力

  • {capability the factory lacked}
  • {工厂缺少的能力}

What worked well

运行良好的内容

  • {skill/pattern that performed efficiently}

**Rules:**
- Be brutally honest — if a skill is broken, say so
- Every defect must have a concrete fix (file + what to change)
- Track what works well too — don't regress good patterns
- Keep entries compact — this file accumulates over time
  • {高效运行的Skill/模式}

**规则**:
- 请绝对诚实——如果某个Skill存在问题,直接指出
- 每个缺陷都必须有具体的修复方案(文件+变更内容)
- 也要记录运行良好的内容——不要让良好的模式退化
- 保持条目简洁——该文件会随时间积累内容

Signal Output

信号输出

Output signal:
<solo:done/>
Important:
/retro
always outputs
<solo:done/>
— it never needs redo. Even if pipeline was terrible, the retro itself always completes.
输出信号
<solo:done/>
重要说明
/retro
始终输出
<solo:done/>
——它永远不需要重做。即使流水线运行状况极差,回顾分析本身也总能完成。

Edge Cases

边缘情况

  • No pipeline.log: abort with clear message — "No pipeline log found at {path}. Run a pipeline first."
  • Empty pipeline.log: report "Pipeline log is empty — was the pipeline cancelled before any iteration?"
  • No iter logs: skip Phase 4 sampling, note in report
  • No plan-done: skip Phase 5, note "No completed plans found"
  • No test/build commands: skip those checks in Phase 6, note in report
  • Pipeline still running: warn user — "State file exists, pipeline may still be running. Retro on partial data."
  • 无pipeline.log:终止操作并显示清晰消息——“在{路径}未找到流水线日志。请先运行流水线。”
  • 空pipeline.log:报告“流水线日志为空——流水线是否在执行任何迭代前就被取消了?”
  • 无迭代日志:跳过阶段4的抽样,在报告中记录
  • 无plan-done目录:跳过阶段5,记录“未找到已完成的计划”
  • 无测试/构建命令:跳过阶段6中的这些检查,在报告中记录
  • 流水线仍在运行:向用户发出警告——“状态文件存在,流水线可能仍在运行。本次回顾基于部分数据。”

Reference Files

参考文件

  • ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md
    — scoring rubric (8 axes, weights)
  • ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md
    — known failure patterns and fixes
  • ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md
    — 评分规则(8个维度,带权重)
  • ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md
    — 已知失败模式及修复方案