solo-retro

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

/retro

This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.

Post-pipeline retrospective. Parses Big Head pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.

该Skill是独立的——请遵循以下步骤执行，无需委托给其他Skill（/review、/audit、/build）或生成Task子Agent。所有分析均需直接运行。

流水线后回顾分析。解析Big Head流水线日志，统计有效迭代与无效迭代的数量，识别重复出现的失败模式，为流水线运行评分，并提出具体的Skill/脚本补丁建议，以避免下次出现相同故障。

When to use

适用场景

After a Big Head pipeline completes (or gets cancelled). This is the process quality check —

/review

checks code quality,

/retro

checks pipeline process quality.

Can also be used standalone on any project that has pipeline logs.

Big Head流水线完成（或被取消）后使用。这是流程质量检查——

/review

用于检查代码质量，

/retro

用于检查流水线流程质量。

也可独立用于任何包含流水线日志的项目。

MCP Tools (use if available)

MCP工具（如有可用请使用）

```
session_search(query)
```
— find past pipeline runs and known issues
```
codegraph_explain(project)
```
— understand project architecture context
```
codegraph_query(query)
```
— query code graph for project metadata

If MCP tools are not available, fall back to Glob + Grep + Read.

```
session_search(query)
```
— 查找过往流水线运行记录和已知问题
```
codegraph_explain(project)
```
— 理解项目架构上下文
```
codegraph_query(query)
```
— 查询代码图谱获取项目元数据

如果MCP工具不可用，则退而使用Glob + Grep + Read。

Phase 1: Locate Artifacts

阶段1：定位工件

Detect project from
```
$ARGUMENTS
```
or CWD:
- If argument provided: use it as project name
- Otherwise: extract from CWD basename (e.g.,
```
~/startups/active/life2film
```
  →
```
life2film
```
  )
Find pipeline state file:
```
~/.solo/pipelines/solo-pipeline-{project}.local.md
```
- If it exists: pipeline is still running or wasn't cleaned up — read YAML frontmatter for
```
project_root:
```
- If not: pipeline completed — use
```
~/startups/active/{project}
```
  as project root

Verify artifacts exist (parallel reads):

Pipeline log:
```
{project_root}/.solo/pipelines/pipeline.log
```
(REQUIRED — abort if missing)

Iter logs:

{project_root}/.solo/pipelines/iter-*.log

Progress file:

{project_root}/.solo/pipelines/progress.md

Plan-done directory:
```
{project_root}/docs/plan-done/
```
Active plan:
```
{project_root}/docs/plan/
```

Count iter logs:

ls {project_root}/.solo/pipelines/iter-*.log | wc -l

Report: "Found {N} iteration logs"

从
$ARGUMENTS
或当前工作目录（CWD）检测项目：
- 若提供了参数：将其作为项目名称
- 否则：从CWD的基名中提取（例如：
```
~/startups/active/life2film
```
  →
```
life2film
```
  ）
查找流水线状态文件：
```
~/.solo/pipelines/solo-pipeline-{project}.local.md
```
- 若文件存在：流水线仍在运行或未被清理——读取YAML前置元数据中的
```
project_root:
```
  字段
- 若文件不存在：流水线已完成——使用
```
~/startups/active/{project}
```
  作为项目根目录

验证工件是否存在（并行读取）：

流水线日志：
```
{project_root}/.solo/pipelines/pipeline.log
```
（必填——若缺失则终止操作）

迭代日志：

{project_root}/.solo/pipelines/iter-*.log

进度文件：

{project_root}/.solo/pipelines/progress.md

已完成计划目录：
```
{project_root}/docs/plan-done/
```
活跃计划：
```
{project_root}/docs/plan/
```

统计迭代日志数量：
```
ls {project_root}/.solo/pipelines/iter-*.log | wc -l
```
- 报告：“找到 {N} 份迭代日志”

Phase 2: Parse Pipeline Log (quantitative)

阶段2：解析流水线日志（定量分析）

Read

pipeline.log

in full. Parse line-by-line, extracting structured data from log tags:

Log format:

[HH:MM:SS] TAG | message

Extract by tag:

Tag	What to extract
`START`	Pipeline run boundary — count restarts (multiple START lines = restarts)
`STAGE`	`iter N/M \| stage S/T: {stage_id}` — iteration count per stage
`SIGNAL`	`<solo:done/>` or `<solo:redo/>` — which stages got completion signals
`INVOKE`	Skill invoked — extract skill name, check for wrong names
`ITER`	`commit: {sha} \| result: {stage complete\|continuing}` — per-iteration outcome
`CHECK`	`{stage} \| {path} -> FOUND\|NOT FOUND` — marker file checks
`FINISH`	`Duration: {N}m` — total duration per run
`MAXITER`	`Reached max iterations ({N})` — hit iteration ceiling
`QUEUE`	Plan cycling events (activating, archiving)
`CIRCUIT`	Circuit breaker triggered (if present)
`CWD`	Working directory changes
`CTRL`	Control signals (pause/stop/skip)

Compute metrics:

total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"

per_stage = {
  stage_id: {
    attempts: count of STAGE lines for this stage,
    successes: count of ITER lines with "stage complete" for this stage,
    waste_ratio: (attempts - successes) / attempts * 100,
  }
}

完整读取

pipeline.log

文件。逐行解析，从日志标签中提取结构化数据：

日志格式：

[HH:MM:SS] TAG | 消息内容

按标签提取信息：

标签	提取内容
`START`	流水线运行边界——统计重启次数（多个START行表示多次重启）
`STAGE`	`iter N/M
`SIGNAL`	`<solo:done/>` 或 `<solo:redo/>` — 哪些阶段收到了完成信号
`INVOKE`	调用的Skill——提取Skill名称，检查是否存在名称错误
`ITER`	`commit: {sha}
`CHECK`	`{stage}
`FINISH`	`Duration: {N}m` — 每次运行的总时长
`MAXITER`	`Reached max iterations ({N})` — 是否达到迭代上限
`QUEUE`	计划循环事件（激活、归档）
`CIRCUIT`	断路器触发记录（若存在）
`CWD`	工作目录变更记录
`CTRL`	控制信号（暂停/停止/跳过）

计算指标：

total_runs = START行的数量
total_iterations = ITER行的数量
productive_iters = 包含"stage complete"的ITER行数量
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = MAXITER行的数量
plan_cycles = 包含"Cycling"的QUEUE行数量

per_stage = {
  stage_id: {
    attempts: 该阶段的STAGE行数量,
    successes: 该阶段中包含"stage complete"的ITER行数量,
    waste_ratio: (attempts - successes) / attempts * 100,
  }
}

Phase 3: Parse Progress.md (qualitative)

阶段3：解析Progress.md（定性分析）

Read

progress.md

and scan for error patterns:

Unknown skill errors: grep for
```
Unknown skill:
```
— extract which skill name was wrong
Empty iterations: iterations where "Last 5 lines" show only errors or session header (no actual work done)
Repeated errors: same error appearing in consecutive iterations → spin-loop indicator
Doubled signals:
```
<solo:done/><solo:done/>
```
in same iteration → minor noise (note but don't penalize)
Redo loops: count how many times build→review→redo→build cycles occurred

For each error pattern found, record:

Pattern name
First occurrence (iteration number)
Total occurrences
Consecutive streak (max)

读取

progress.md

并扫描错误模式：

未知Skill错误：搜索
```
Unknown skill:
```
——提取错误的Skill名称
空迭代：迭代中“最后5行”仅显示错误或会话头（未执行实际工作）
重复错误：连续迭代中出现相同错误→表示存在循环自旋
重复信号：同一迭代中出现
```
<solo:done/><solo:done/>
```
→属于轻微噪音（仅记录，不扣分）
重做循环：统计
```
构建→评审→重做→构建
```
循环发生的次数

对于每个发现的错误模式，记录：

模式名称
首次出现位置（迭代编号）
总出现次数
连续出现的最大次数

Phase 4: Analyze Iter Logs (sample-based)

阶段4：解析迭代日志（抽样分析）

Do NOT read all iter logs — could be 60+. Use smart sampling:

First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it
- Strip ANSI codes when reading:
```
sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
```
First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log
- Look for
```
<solo:done/>
```
  in the output
Final review iter: Read the last
```
iter-*-review.log
```
(the verdict)
Extract from each sampled log:
- Tools called (count of tool_use blocks)
- Errors encountered (grep for
```
Error
```
  ,
```
error
```
  ,
```
Unknown
```
  ,
```
failed
```
  )
- Signal output (
```
<solo:done/>
```
  or
```
<solo:redo/>
```
  present?)
- First 5 and last 10 meaningful lines (skip blank lines)

请勿读取所有迭代日志——数量可能超过60份。请使用智能抽样：

每种模式的首次失败迭代：针对阶段3中发现的每个失败模式，读取首次出现该模式的迭代日志
- 读取时去除ANSI代码：
```
sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
```
每个阶段的首次成功迭代：针对最终成功的每个阶段，读取首次成功的迭代日志
- 查找输出中的
```
<solo:done/>
```
最终评审迭代：读取最后一份
```
iter-*-review.log
```
（最终评审结果）
从每份抽样日志中提取以下信息：
- 调用的工具（tool_use块的数量）
- 遇到的错误（搜索
```
Error
```
  、
```
error
```
  、
```
Unknown
```
  、
```
failed
```
  ）
- 信号输出（是否存在
```
<solo:done/>
```
  或
```
<solo:redo/>
```
  ）
- 前5行和后10行有意义的内容（跳过空行）

Phase 5: Plan Fidelity Check

阶段5：计划一致性检查

For each track directory in

docs/plan-done/

and

docs/plan/

Read spec.md (if exists):
- Count acceptance criteria: total
```
- [ ]
```
  and
```
- [x]
```
  checkboxes
- Calculate:
```
criteria_met = checked / total * 100
```
Read plan.md (if exists):
- Count tasks: total
```
- [ ]
```
  and
```
- [x]
```
  checkboxes
- Count phases (## headers)
- Check for SHA annotations (
```

```
  )
- Calculate:
```
tasks_done = checked / total * 100
```
Compile per-track summary:
- Track ID, criteria met %, tasks done %, has SHAs

针对

docs/plan-done/

和

docs/plan/

中的每个跟踪目录：

读取spec.md（若存在）：
- 统计验收标准：所有
```
- [ ]
```
  和
```
- [x]
```
  复选框的总数
- 计算：
```
criteria_met = 已勾选数量 / 总数量 * 100
```
读取plan.md（若存在）：
- 统计任务：所有
```
- [ ]
```
  和
```
- [x]
```
  复选框的总数
- 统计阶段数量（## 标题的数量）
- 检查是否存在SHA注释（
```

```
  ）
- 计算：
```
tasks_done = 已勾选数量 / 总数量 * 100
```
编译每个跟踪目录的摘要：
- 跟踪ID、验收标准达标率%、任务完成率%、是否包含SHA

Phase 6: Git & Code Quality (lightweight)

阶段6：Git与代码质量（轻量级检查）

Quick checks only — NOT a full /review:

Commit count and format:

bash

git -C {project_root} log --oneline | wc -l
git -C {project_root} log --oneline | head -30

Count commits with conventional format (
```
feat:
```
,
```
fix:
```
,
```
chore:
```
,
```
test:
```
,
```
docs:
```
,
```
refactor:
```
,
```
build:
```
,
```
ci:
```
,
```
perf:
```
)

Calculate:

conventional_pct = conventional / total * 100

Committer breakdown:

bash

git -C {project_root} shortlog -sn --no-merges | head -10

Test status (if test command exists in CLAUDE.md or package.json):
- Run test suite, capture pass/fail count
- If no test command found, skip and note "no tests configured"
Build status (if build command exists):
- Run build, capture success/fail
- If no build command found, skip and note "no build configured"

仅进行快速检查——并非完整的/review：

提交数量与格式：

bash

git -C {project_root} log --oneline | wc -l
git -C {project_root} log --oneline | head -30

统计符合规范格式的提交（
```
feat:
```
、
```
fix:
```
、
```
chore:
```
、
```
test:
```
、
```
docs:
```
、
```
refactor:
```
、
```
build:
```
、
```
ci:
```
、
```
perf:
```
）

计算：

conventional_pct = 规范格式提交数 / 总提交数 * 100

提交者分布：

bash

git -C {project_root} shortlog -sn --no-merges | head -10

测试状态（若CLAUDE.md或package.json中存在测试命令）：
- 运行测试套件，记录通过/失败数量
- 若未找到测试命令，则跳过并记录“未配置测试”
构建状态（若存在构建命令）：
- 运行构建，记录成功/失败状态
- 若未找到构建命令，则跳过并记录“未配置构建”

Phase 7: Score & Report

阶段7：评分与报告

Load scoring rubric from

${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md

. If plugin root not available, use the embedded weights:

Scoring weights:

Efficiency (waste %): 25%
Stability (restarts): 20%
Fidelity (criteria met): 20%
Quality (test pass rate): 15%
Commits (conventional %): 5%
Docs (plan staleness): 5%
Signals (clean signals): 5%
Speed (total duration): 5%

Generate report at

{project_root}/docs/retro/{date}-retro.md

markdown

undefined

从

${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md

加载评分规则。若插件根目录不可用，则使用内置权重：

评分权重：

效率（浪费率%）：25%
稳定性（重启次数）：20%
一致性（验收标准达标率）：20%
质量（测试通过率）：15%
提交（规范格式占比%）：5%
文档（计划陈旧度）：5%
信号（信号整洁度）：5%
速度（总时长）：5%

生成报告，保存至

{project_root}/docs/retro/{date}-retro.md

：

markdown

undefined

Pipeline Retro: {project} ({date})

流水线回顾：{project}（{date}）

Overall Score: {N}/10

总体评分：{N}/10

Pipeline Efficiency

流水线效率

Metric	Value	Rating
Total iterations	{N}
Productive iterations	{N} ({pct}%)	{emoji}
Wasted iterations	{N} ({pct}%)	{emoji}
Pipeline restarts	{N}	{emoji}
Max-iter hits	{N}	{emoji}
Total duration	{time}	{emoji}

指标	数值	评级
总迭代次数	{N}
有效迭代次数	{N}（{pct}%）	{emoji}
无效迭代次数	{N}（{pct}%）	{emoji}
流水线重启次数	{N}	{emoji}
达到迭代上限次数	{N}	{emoji}
总时长	{time}	{emoji}

Per-Stage Breakdown

各阶段细分

Stage	Attempts	Successes	Waste %	Notes
scaffold
setup
plan
build
deploy
review

阶段	尝试次数	成功次数	浪费率%	备注
scaffold
setup
plan
build
deploy
review

Failure Patterns

失败模式

Pattern 1: {name}

模式1：{name}

Occurrences: {N} iterations
Root cause: {analysis}
Wasted: {N} iterations
Fix: {concrete suggestion with file reference}

出现次数：{N} 次迭代
根本原因：{分析内容}
浪费的迭代次数：{N}
修复建议：{带文件引用的具体建议}

Pattern 2: ...

模式2：...

Plan Fidelity

计划一致性

Track	Criteria Met	Tasks Done	SHAs	Rating
{track-id}	{N}%	{N}%	{yes/no}	{emoji}

跟踪目录	验收标准达标率	任务完成率	是否包含SHA	评级
{track-id}	{N}%	{N}%	{是/否}	{emoji}

Code Quality (Quick)

代码质量（快速检查）

Tests: {N} pass, {N} fail (or "not configured")
Build: PASS / FAIL (or "not configured")
Commits: {N} total, {pct}% conventional format

测试：{N} 项通过，{N} 项失败（或“未配置”）
构建：通过 / 失败（或“未配置”）
提交：共 {N} 次，{pct}% 符合规范格式

Three-Axis Growth

三维成长评估

Axis	Score	Evidence
Technical (code, tools, architecture)	{0-10}	{what changed}
Cognitive (understanding, strategy, decisions)	{0-10}	{what improved}
Process (harness, skills, pipeline, docs)	{0-10}	{what evolved}

If only one axis is served — note what's missing.

维度	评分	依据
技术维度（代码、工具、架构）	{0-10}	{变更内容}
认知维度（理解、策略、决策）	{0-10}	{改进内容}
流程维度（Harness、Skill、流水线、文档）	{0-10}	{演进内容}

若仅覆盖了一个维度——需记录缺失的维度。

Recommendations

建议

[CRITICAL] {patch suggestion with file:line reference}
[HIGH] {improvement}
[MEDIUM] {optimization}
[LOW] {nice-to-have}

[CRITICAL] {带文件:行号引用的补丁建议}
[HIGH] {改进建议}
[MEDIUM] {优化建议}
[LOW] {锦上添花的建议}

Suggested Patches

建议补丁

Patch 1: {file} — {description}

补丁1：{file} — {描述}

What: {one-line description} Why: {root cause reference from Failure Patterns}

```diff

old line

new line ```


**Rating guide (use these emojis):**
- GREEN = excellent
- YELLOW = acceptable
- RED = needs attention

内容：{一行描述} 原因：{引用失败模式中的根本原因}

diff

- 旧代码行
+ 新代码行


**评级指南（使用以下emoji）**：
- 绿色 = 优秀
- 黄色 = 可接受
- 红色 = 需要关注

Phase 8: Interactive Patching

阶段8：交互式补丁

After generating the report:

Show summary to user: overall score, top 3 failure patterns, top 3 recommendations
For each suggested patch (if any), use
```
AskUserQuestion
```
:
- Question: "Apply patch to {file}? {one-line description}"
- Options: "Apply" / "Skip" / "Show diff first"
If "Show diff first": display the full diff, then ask again (Apply / Skip)
If "Apply": use Edit tool to apply the change directly
After all patches processed:
- If any patches were applied: suggest committing with
```
fix(retro): {description}
```
- Do NOT auto-commit — just suggest the command

生成报告后：

向用户展示摘要：总体评分、前3个失败模式、前3个建议
针对每个建议补丁（若有），使用
```
AskUserQuestion
```
：
- 问题：“是否将补丁应用到 {file}？{一行描述}”
- 选项：“应用” / “跳过” / “先显示差异”
若选择“先显示差异”：显示完整差异内容，然后再次询问（应用 / 跳过）
若选择“应用”：使用编辑工具直接应用变更
处理完所有补丁后：
- 若应用了任何补丁：建议使用
```
fix(retro): {描述}
```
  提交变更
- 请勿自动提交——仅建议命令

Phase 9: CLAUDE.md Revision

阶段9：修订CLAUDE.md

After patching, revise the project's CLAUDE.md to keep it lean and useful for future agents.

补丁应用完成后，修订项目的CLAUDE.md，使其保持精简，便于未来Agent使用。

Steps:

步骤：

Read CLAUDE.md and check size:
```
wc -c CLAUDE.md
```
Add learnings from this retro:
- Pipeline failure patterns worth remembering (avoid next time)
- New workflow rules or process improvements
- Updated commands or tooling changes
- Architecture decisions that emerged during the pipeline run
If over 40,000 characters — trim ruthlessly:
- Collapse completed phase/milestone histories into one line each
- Remove verbose explanations — keep terse, actionable notes
- Remove duplicate info (same thing explained in multiple sections)
- Remove historical migration notes, old debugging context
- Remove examples that are obvious from code or covered by skill/doc files
- Remove outdated troubleshooting for resolved issues
Verify result ≤ 40,000 characters — if still over, cut least actionable content
Write updated CLAUDE.md, update "Last updated" date

读取CLAUDE.md并检查大小：
```
wc -c CLAUDE.md
```
添加本次回顾的经验总结：
- 值得记住的流水线失败模式（避免下次出现）
- 新的工作流规则或流程改进
- 更新后的命令或工具变更
- 流水线运行过程中形成的架构决策
若文件大小超过40,000字符——严格精简：
- 将已完成的阶段/里程碑历史压缩为每行一个阶段
- 删除冗长的解释——保留简洁、可操作的说明
- 删除重复信息（同一内容在多个章节中解释）
- 删除历史迁移说明、旧调试上下文
- 删除代码中已明确或Skill/文档文件已覆盖的示例
- 删除已解决问题的过时故障排除内容
验证最终大小≤40,000字符——若仍超出，删除最不具操作性的内容
写入更新后的CLAUDE.md，更新“最后更新”日期

Priority (keep → cut):

优先级（保留→删除）：

ALWAYS KEEP: Tech stack, directory structure, Do/Don't rules, common commands, architecture decisions
KEEP: Workflow instructions, troubleshooting for active issues, key file references
CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues

必须保留：技术栈、目录结构、Do/Don't规则、常用命令、架构决策
建议保留：工作流说明、当前问题的故障排除、关键文件引用
可压缩：阶段历史（每行一个）、详细示例、工具/MCP列表
优先删除：历史记录、冗长解释、重复内容、已解决问题

Rules:

规则：

Never remove Do/Don't sections — critical guardrails
Preserve overall section structure and ordering
Every line must earn its place: "would a future agent need this to do their job?"

Commit the update:

git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"

切勿删除Do/Don't章节——这是关键的防护规则
保留整体章节结构和顺序
每一行内容都必须有存在的价值：“未来Agent完成工作是否需要这部分内容？”

提交更新：

git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"

Phase 10: Factory Critic

阶段10：工厂评审

After evaluating the project pipeline, step back and evaluate the factory itself — the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.

完成项目流水线评估后，退一步评估工厂本身——即生成该结果的Skill、脚本和流水线逻辑。请做一个严苛的评审者。

What to evaluate:

评估内容：

Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):
- For each skill:
```
${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md
```
- Did the skill have the right instructions for this project's needs?
- Did it miss context it should have had?
Read solo-dev.sh signal handling and stage logic:
- ```
${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh
```
- Were there structural issues (wrong stage order, missing re-exec, broken redo)?
Cross-reference with failure patterns from Phase 3:
- For each failure: was the root cause in the skill, the script, or the project?
- Skills that caused waste = factory defects

读取本次流水线运行中调用的Skill（来自流水线日志中的INVOKE行）：
- 每个Skill的路径：
```
${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md
```
- 该Skill是否具备符合项目需求的正确指令？
- 它是否缺失了本应具备的上下文信息？
读取solo-dev.sh的信号处理和阶段逻辑：
- 路径：
```
${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh
```
- 是否存在结构性问题（阶段顺序错误、缺失重新执行、重做功能损坏）？
与阶段3的失败模式交叉验证：
- 对于每个失败：根本原因是在Skill中、脚本中还是项目本身？
- 导致浪费的Skill=工厂缺陷

Score the factory (not the project):

为工厂评分（而非项目）：

Factory Score: {N}/10

Skill quality:
- {skill}: {score}/10 — {why}
- {skill}: {score}/10 — {why}

Pipeline reliability: {N}/10 — {why}

Missing capabilities:
- {what the factory couldn't do that it should have}

Top factory defects:
1. {defect} → {which file to fix} → {concrete fix}
2. {defect} → {which file to fix} → {concrete fix}

工厂评分：{N}/10

Skill质量：
- {skill}: {score}/10 — {原因}
- {skill}: {score}/10 — {原因}

流水线可靠性：{N}/10 — {原因}

缺失的能力：
- {工厂本应具备但缺失的能力}

主要工厂缺陷：
1. {缺陷描述} → {需修复的文件} → {具体修复方案}
2. {缺陷描述} → {需修复的文件} → {具体修复方案}

Harness Evolution — think about the bigger picture

Harness演进——从更宏观的角度思考

After scoring the factory, step back further and think about the harness — the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:

Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?
- Missing docs → add to
```
docs/
```
  or CLAUDE.md
- Stale docs → flag for doc-gardening
- Knowledge only in your head → encode it
Architectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?
- Repeated boundary violations → need a linter or structural test
- Inconsistent patterns → need golden principle in CLAUDE.md
- Data shape errors → need parse-at-boundary enforcement
Decision traces: What worked well that future agents should reuse? What failed that they should avoid?
- Good patterns → capture as precedent in KB or CLAUDE.md
- Bad patterns → encode as anti-pattern or lint rule
- Think: "if another agent hits this same problem tomorrow, what should it find?"
Skill gaps: Which skills need better instructions? Which new skills should exist?
- Skill that caused waste → concrete SKILL.md patch
- Missing capability → new skill idea for evolution.md

Append harness findings to the evolution log alongside factory defects.

完成工厂评分后，再退一步思考Harness——即指导Agent工作的整个系统（CLAUDE.md、docs/、代码检查器、Skill、模板）。请思考：

上下文工程：Agent是否在仓库中获得了所需的所有信息？还是因为知识缺失/分散/陈旧而遇到困难？
- 缺失的文档→添加到
```
docs/
```
  或CLAUDE.md
- 陈旧的文档→标记为需要文档维护
- 仅存在于你脑海中的知识→将其编码到文档中
架构约束：Agent是否违反了模块边界、产生了不一致的模式或忽略了约定？
- 重复的边界违规→需要代码检查器或结构测试
- 不一致的模式→需要在CLAUDE.md中添加黄金原则
- 数据格式错误→需要在边界处强制执行解析检查
决策轨迹：哪些内容运行良好，值得未来Agent复用？哪些内容失败，应避免？
- 良好模式→捕获为知识库或CLAUDE.md中的先例
- 不良模式→编码为反模式或代码检查规则
- 思考：“如果另一个Agent明天遇到同样的问题，它应该能找到什么解决方案？”
Skill缺口：哪些Skill需要更好的指令？哪些新Skill应该存在？
- 导致浪费的Skill→具体的SKILL.md补丁
- 缺失的能力→在evolution.md中记录新Skill的想法

将Harness的发现与工厂缺陷一起添加到演进日志中。

Write to evolution log:

写入演进日志：

Append findings to

~/.solo/evolution.md

(create if not exists):

markdown

undefined

将发现内容追加到

~/.solo/evolution.md

（若不存在则创建）：

markdown

undefined

{YYYY-MM-DD} | {project} | Factory Score: {N}/10

{YYYY-MM-DD} | {project} | 工厂评分：{N}/10

Pipeline: {stages run} | Iters: {total} | Waste: {pct}%

流水线：{运行的阶段} | 迭代次数：{总次数} | 浪费率：{pct}%

Defects

缺陷

{severity} | {skill/script}: {description}
- Fix: {concrete file:change}

{严重程度} | {skill/脚本}：{描述}
- 修复方案：{具体的文件:变更内容}

Harness Gaps

Harness缺口

Context: {what knowledge was missing or stale for the agent}
Constraints: {what boundary violations or inconsistencies occurred}
Precedents: {patterns worth capturing for future agents — good or bad}

上下文：{Agent缺失或陈旧的知识内容}
约束：{发生的边界违规或不一致问题}
先例：{值得为未来Agent记录的模式——无论好坏}

Missing

缺失的能力

{capability the factory lacked}

{工厂缺少的能力}

What worked well

运行良好的内容

{skill/pattern that performed efficiently}


**Rules:**
- Be brutally honest — if a skill is broken, say so
- Every defect must have a concrete fix (file + what to change)
- Track what works well too — don't regress good patterns
- Keep entries compact — this file accumulates over time

{高效运行的Skill/模式}


**规则**：
- 请绝对诚实——如果某个Skill存在问题，直接指出
- 每个缺陷都必须有具体的修复方案（文件+变更内容）
- 也要记录运行良好的内容——不要让良好的模式退化
- 保持条目简洁——该文件会随时间积累内容

Signal Output

信号输出

Output signal:

<solo:done/>

Important:

/retro

always outputs

<solo:done/>

— it never needs redo. Even if pipeline was terrible, the retro itself always completes.

输出信号：

<solo:done/>

重要说明：

/retro

始终输出

<solo:done/>

——它永远不需要重做。即使流水线运行状况极差，回顾分析本身也总能完成。

Edge Cases

边缘情况

No pipeline.log: abort with clear message — "No pipeline log found at {path}. Run a pipeline first."
Empty pipeline.log: report "Pipeline log is empty — was the pipeline cancelled before any iteration?"
No iter logs: skip Phase 4 sampling, note in report
No plan-done: skip Phase 5, note "No completed plans found"
No test/build commands: skip those checks in Phase 6, note in report
Pipeline still running: warn user — "State file exists, pipeline may still be running. Retro on partial data."

无pipeline.log：终止操作并显示清晰消息——“在{路径}未找到流水线日志。请先运行流水线。”
空pipeline.log：报告“流水线日志为空——流水线是否在执行任何迭代前就被取消了？”
无迭代日志：跳过阶段4的抽样，在报告中记录
无plan-done目录：跳过阶段5，记录“未找到已完成的计划”
无测试/构建命令：跳过阶段6中的这些检查，在报告中记录
流水线仍在运行：向用户发出警告——“状态文件存在，流水线可能仍在运行。本次回顾基于部分数据。”

Reference Files

参考文件

${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md

— scoring rubric (8 axes, weights)

${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md

— known failure patterns and fixes

${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md

— 评分规则（8个维度，带权重）

${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md

— 已知失败模式及修复方案