solo-retro
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/retro
/retro
This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.
Post-pipeline retrospective. Parses Big Head pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.
该Skill是独立的——请遵循以下步骤执行,无需委托给其他Skill(/review、/audit、/build)或生成Task子Agent。所有分析均需直接运行。
流水线后回顾分析。解析Big Head流水线日志,统计有效迭代与无效迭代的数量,识别重复出现的失败模式,为流水线运行评分,并提出具体的Skill/脚本补丁建议,以避免下次出现相同故障。
When to use
适用场景
After a Big Head pipeline completes (or gets cancelled). This is the process quality check — checks code quality, checks pipeline process quality.
/review/retroCan also be used standalone on any project that has pipeline logs.
Big Head流水线完成(或被取消)后使用。这是流程质量检查——用于检查代码质量,用于检查流水线流程质量。
/review/retro也可独立用于任何包含流水线日志的项目。
MCP Tools (use if available)
MCP工具(如有可用请使用)
- — find past pipeline runs and known issues
session_search(query) - — understand project architecture context
codegraph_explain(project) - — query code graph for project metadata
codegraph_query(query)
If MCP tools are not available, fall back to Glob + Grep + Read.
- — 查找过往流水线运行记录和已知问题
session_search(query) - — 理解项目架构上下文
codegraph_explain(project) - — 查询代码图谱获取项目元数据
codegraph_query(query)
如果MCP工具不可用,则退而使用Glob + Grep + Read。
Phase 1: Locate Artifacts
阶段1:定位工件
-
Detect project fromor CWD:
$ARGUMENTS- If argument provided: use it as project name
- Otherwise: extract from CWD basename (e.g., →
~/startups/active/life2film)life2film
-
Find pipeline state file:
~/.solo/pipelines/solo-pipeline-{project}.local.md- If it exists: pipeline is still running or wasn't cleaned up — read YAML frontmatter for
project_root: - If not: pipeline completed — use as project root
~/startups/active/{project}
- If it exists: pipeline is still running or wasn't cleaned up — read YAML frontmatter for
-
Verify artifacts exist (parallel reads):
- Pipeline log: (REQUIRED — abort if missing)
{project_root}/.solo/pipelines/pipeline.log - Iter logs:
{project_root}/.solo/pipelines/iter-*.log - Progress file:
{project_root}/.solo/pipelines/progress.md - Plan-done directory:
{project_root}/docs/plan-done/ - Active plan:
{project_root}/docs/plan/
- Pipeline log:
-
Count iter logs:
ls {project_root}/.solo/pipelines/iter-*.log | wc -l- Report: "Found {N} iteration logs"
-
从或当前工作目录(CWD)检测项目:
$ARGUMENTS- 若提供了参数:将其作为项目名称
- 否则:从CWD的基名中提取(例如:→
~/startups/active/life2film)life2film
-
查找流水线状态文件:
~/.solo/pipelines/solo-pipeline-{project}.local.md- 若文件存在:流水线仍在运行或未被清理——读取YAML前置元数据中的字段
project_root: - 若文件不存在:流水线已完成——使用作为项目根目录
~/startups/active/{project}
- 若文件存在:流水线仍在运行或未被清理——读取YAML前置元数据中的
-
验证工件是否存在(并行读取):
- 流水线日志:(必填——若缺失则终止操作)
{project_root}/.solo/pipelines/pipeline.log - 迭代日志:
{project_root}/.solo/pipelines/iter-*.log - 进度文件:
{project_root}/.solo/pipelines/progress.md - 已完成计划目录:
{project_root}/docs/plan-done/ - 活跃计划:
{project_root}/docs/plan/
- 流水线日志:
-
统计迭代日志数量:
ls {project_root}/.solo/pipelines/iter-*.log | wc -l- 报告:“找到 {N} 份迭代日志”
Phase 2: Parse Pipeline Log (quantitative)
阶段2:解析流水线日志(定量分析)
Read in full. Parse line-by-line, extracting structured data from log tags:
pipeline.logLog format:
[HH:MM:SS] TAG | messageExtract by tag:
| Tag | What to extract |
|---|---|
| Pipeline run boundary — count restarts (multiple START lines = restarts) |
| |
| |
| Skill invoked — extract skill name, check for wrong names |
| |
| |
| |
| |
| Plan cycling events (activating, archiving) |
| Circuit breaker triggered (if present) |
| Working directory changes |
| Control signals (pause/stop/skip) |
Compute metrics:
total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"
per_stage = {
stage_id: {
attempts: count of STAGE lines for this stage,
successes: count of ITER lines with "stage complete" for this stage,
waste_ratio: (attempts - successes) / attempts * 100,
}
}完整读取文件。逐行解析,从日志标签中提取结构化数据:
pipeline.log日志格式:
[HH:MM:SS] TAG | 消息内容按标签提取信息:
| 标签 | 提取内容 |
|---|---|
| 流水线运行边界——统计重启次数(多个START行表示多次重启) |
| `iter N/M |
| |
| 调用的Skill——提取Skill名称,检查是否存在名称错误 |
| `commit: {sha} |
| `{stage} |
| |
| |
| 计划循环事件(激活、归档) |
| 断路器触发记录(若存在) |
| 工作目录变更记录 |
| 控制信号(暂停/停止/跳过) |
计算指标:
total_runs = START行的数量
total_iterations = ITER行的数量
productive_iters = 包含"stage complete"的ITER行数量
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = MAXITER行的数量
plan_cycles = 包含"Cycling"的QUEUE行数量
per_stage = {
stage_id: {
attempts: 该阶段的STAGE行数量,
successes: 该阶段中包含"stage complete"的ITER行数量,
waste_ratio: (attempts - successes) / attempts * 100,
}
}Phase 3: Parse Progress.md (qualitative)
阶段3:解析Progress.md(定性分析)
Read and scan for error patterns:
progress.md- Unknown skill errors: grep for — extract which skill name was wrong
Unknown skill: - Empty iterations: iterations where "Last 5 lines" show only errors or session header (no actual work done)
- Repeated errors: same error appearing in consecutive iterations → spin-loop indicator
- Doubled signals: in same iteration → minor noise (note but don't penalize)
<solo:done/><solo:done/> - Redo loops: count how many times build→review→redo→build cycles occurred
For each error pattern found, record:
- Pattern name
- First occurrence (iteration number)
- Total occurrences
- Consecutive streak (max)
读取并扫描错误模式:
progress.md- 未知Skill错误:搜索——提取错误的Skill名称
Unknown skill: - 空迭代:迭代中“最后5行”仅显示错误或会话头(未执行实际工作)
- 重复错误:连续迭代中出现相同错误→表示存在循环自旋
- 重复信号:同一迭代中出现→属于轻微噪音(仅记录,不扣分)
<solo:done/><solo:done/> - 重做循环:统计循环发生的次数
构建→评审→重做→构建
对于每个发现的错误模式,记录:
- 模式名称
- 首次出现位置(迭代编号)
- 总出现次数
- 连续出现的最大次数
Phase 4: Analyze Iter Logs (sample-based)
阶段4:解析迭代日志(抽样分析)
Do NOT read all iter logs — could be 60+. Use smart sampling:
-
First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it
- Strip ANSI codes when reading:
sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
- Strip ANSI codes when reading:
-
First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log
- Look for in the output
<solo:done/>
- Look for
-
Final review iter: Read the last(the verdict)
iter-*-review.log -
Extract from each sampled log:
- Tools called (count of tool_use blocks)
- Errors encountered (grep for ,
Error,error,Unknown)failed - Signal output (or
<solo:done/>present?)<solo:redo/> - First 5 and last 10 meaningful lines (skip blank lines)
请勿读取所有迭代日志——数量可能超过60份。请使用智能抽样:
-
每种模式的首次失败迭代:针对阶段3中发现的每个失败模式,读取首次出现该模式的迭代日志
- 读取时去除ANSI代码:
sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
- 读取时去除ANSI代码:
-
每个阶段的首次成功迭代:针对最终成功的每个阶段,读取首次成功的迭代日志
- 查找输出中的
<solo:done/>
- 查找输出中的
-
最终评审迭代:读取最后一份(最终评审结果)
iter-*-review.log -
从每份抽样日志中提取以下信息:
- 调用的工具(tool_use块的数量)
- 遇到的错误(搜索、
Error、error、Unknown)failed - 信号输出(是否存在或
<solo:done/>)<solo:redo/> - 前5行和后10行有意义的内容(跳过空行)
Phase 5: Plan Fidelity Check
阶段5:计划一致性检查
For each track directory in and :
docs/plan-done/docs/plan/-
Read spec.md (if exists):
- Count acceptance criteria: total and
- [ ]checkboxes- [x] - Calculate:
criteria_met = checked / total * 100
- Count acceptance criteria: total
-
Read plan.md (if exists):
- Count tasks: total and
- [ ]checkboxes- [x] - Count phases (## headers)
- Check for SHA annotations ()
<!-- sha:... --> - Calculate:
tasks_done = checked / total * 100
- Count tasks: total
-
Compile per-track summary:
- Track ID, criteria met %, tasks done %, has SHAs
针对和中的每个跟踪目录:
docs/plan-done/docs/plan/-
读取spec.md(若存在):
- 统计验收标准:所有和
- [ ]复选框的总数- [x] - 计算:
criteria_met = 已勾选数量 / 总数量 * 100
- 统计验收标准:所有
-
读取plan.md(若存在):
- 统计任务:所有和
- [ ]复选框的总数- [x] - 统计阶段数量(## 标题的数量)
- 检查是否存在SHA注释()
<!-- sha:... --> - 计算:
tasks_done = 已勾选数量 / 总数量 * 100
- 统计任务:所有
-
编译每个跟踪目录的摘要:
- 跟踪ID、验收标准达标率%、任务完成率%、是否包含SHA
Phase 6: Git & Code Quality (lightweight)
阶段6:Git与代码质量(轻量级检查)
Quick checks only — NOT a full /review:
-
Commit count and format:bash
git -C {project_root} log --oneline | wc -l git -C {project_root} log --oneline | head -30- Count commits with conventional format (,
feat:,fix:,chore:,test:,docs:,refactor:,build:,ci:)perf: - Calculate:
conventional_pct = conventional / total * 100
- Count commits with conventional format (
-
Committer breakdown:bash
git -C {project_root} shortlog -sn --no-merges | head -10 -
Test status (if test command exists in CLAUDE.md or package.json):
- Run test suite, capture pass/fail count
- If no test command found, skip and note "no tests configured"
-
Build status (if build command exists):
- Run build, capture success/fail
- If no build command found, skip and note "no build configured"
仅进行快速检查——并非完整的/review:
-
提交数量与格式:bash
git -C {project_root} log --oneline | wc -l git -C {project_root} log --oneline | head -30- 统计符合规范格式的提交(、
feat:、fix:、chore:、test:、docs:、refactor:、build:、ci:)perf: - 计算:
conventional_pct = 规范格式提交数 / 总提交数 * 100
- 统计符合规范格式的提交(
-
提交者分布:bash
git -C {project_root} shortlog -sn --no-merges | head -10 -
测试状态(若CLAUDE.md或package.json中存在测试命令):
- 运行测试套件,记录通过/失败数量
- 若未找到测试命令,则跳过并记录“未配置测试”
-
构建状态(若存在构建命令):
- 运行构建,记录成功/失败状态
- 若未找到构建命令,则跳过并记录“未配置构建”
Phase 7: Score & Report
阶段7:评分与报告
Load scoring rubric from .
If plugin root not available, use the embedded weights:
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.mdScoring weights:
- Efficiency (waste %): 25%
- Stability (restarts): 20%
- Fidelity (criteria met): 20%
- Quality (test pass rate): 15%
- Commits (conventional %): 5%
- Docs (plan staleness): 5%
- Signals (clean signals): 5%
- Speed (total duration): 5%
Generate report at :
{project_root}/docs/retro/{date}-retro.mdmarkdown
undefined从加载评分规则。若插件根目录不可用,则使用内置权重:
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md评分权重:
- 效率(浪费率%):25%
- 稳定性(重启次数):20%
- 一致性(验收标准达标率):20%
- 质量(测试通过率):15%
- 提交(规范格式占比%):5%
- 文档(计划陈旧度):5%
- 信号(信号整洁度):5%
- 速度(总时长):5%
生成报告,保存至:
{project_root}/docs/retro/{date}-retro.mdmarkdown
undefinedPipeline Retro: {project} ({date})
流水线回顾:{project}({date})
Overall Score: {N}/10
总体评分:{N}/10
Pipeline Efficiency
流水线效率
| Metric | Value | Rating |
|---|---|---|
| Total iterations | {N} | |
| Productive iterations | {N} ({pct}%) | {emoji} |
| Wasted iterations | {N} ({pct}%) | {emoji} |
| Pipeline restarts | {N} | {emoji} |
| Max-iter hits | {N} | {emoji} |
| Total duration | {time} | {emoji} |
| 指标 | 数值 | 评级 |
|---|---|---|
| 总迭代次数 | {N} | |
| 有效迭代次数 | {N}({pct}%) | {emoji} |
| 无效迭代次数 | {N}({pct}%) | {emoji} |
| 流水线重启次数 | {N} | {emoji} |
| 达到迭代上限次数 | {N} | {emoji} |
| 总时长 | {time} | {emoji} |
Per-Stage Breakdown
各阶段细分
| Stage | Attempts | Successes | Waste % | Notes |
|---|---|---|---|---|
| scaffold | ||||
| setup | ||||
| plan | ||||
| build | ||||
| deploy | ||||
| review |
| 阶段 | 尝试次数 | 成功次数 | 浪费率% | 备注 |
|---|---|---|---|---|
| scaffold | ||||
| setup | ||||
| plan | ||||
| build | ||||
| deploy | ||||
| review |
Failure Patterns
失败模式
Pattern 1: {name}
模式1:{name}
- Occurrences: {N} iterations
- Root cause: {analysis}
- Wasted: {N} iterations
- Fix: {concrete suggestion with file reference}
- 出现次数:{N} 次迭代
- 根本原因:{分析内容}
- 浪费的迭代次数:{N}
- 修复建议:{带文件引用的具体建议}
Pattern 2: ...
模式2:...
Plan Fidelity
计划一致性
| Track | Criteria Met | Tasks Done | SHAs | Rating |
|---|---|---|---|---|
| {track-id} | {N}% | {N}% | {yes/no} | {emoji} |
| 跟踪目录 | 验收标准达标率 | 任务完成率 | 是否包含SHA | 评级 |
|---|---|---|---|---|
| {track-id} | {N}% | {N}% | {是/否} | {emoji} |
Code Quality (Quick)
代码质量(快速检查)
- Tests: {N} pass, {N} fail (or "not configured")
- Build: PASS / FAIL (or "not configured")
- Commits: {N} total, {pct}% conventional format
- 测试:{N} 项通过,{N} 项失败(或“未配置”)
- 构建:通过 / 失败(或“未配置”)
- 提交:共 {N} 次,{pct}% 符合规范格式
Three-Axis Growth
三维成长评估
| Axis | Score | Evidence |
|---|---|---|
| Technical (code, tools, architecture) | {0-10} | {what changed} |
| Cognitive (understanding, strategy, decisions) | {0-10} | {what improved} |
| Process (harness, skills, pipeline, docs) | {0-10} | {what evolved} |
If only one axis is served — note what's missing.
| 维度 | 评分 | 依据 |
|---|---|---|
| 技术维度(代码、工具、架构) | {0-10} | {变更内容} |
| 认知维度(理解、策略、决策) | {0-10} | {改进内容} |
| 流程维度(Harness、Skill、流水线、文档) | {0-10} | {演进内容} |
若仅覆盖了一个维度——需记录缺失的维度。
Recommendations
建议
- [CRITICAL] {patch suggestion with file:line reference}
- [HIGH] {improvement}
- [MEDIUM] {optimization}
- [LOW] {nice-to-have}
- [CRITICAL] {带文件:行号引用的补丁建议}
- [HIGH] {改进建议}
- [MEDIUM] {优化建议}
- [LOW] {锦上添花的建议}
Suggested Patches
建议补丁
Patch 1: {file} — {description}
补丁1:{file} — {描述}
What: {one-line description}
Why: {root cause reference from Failure Patterns}
```diff
- old line
- new line ```
**Rating guide (use these emojis):**
- GREEN = excellent
- YELLOW = acceptable
- RED = needs attention内容:{一行描述}
原因:{引用失败模式中的根本原因}
diff
- 旧代码行
+ 新代码行
**评级指南(使用以下emoji)**:
- 绿色 = 优秀
- 黄色 = 可接受
- 红色 = 需要关注Phase 8: Interactive Patching
阶段8:交互式补丁
After generating the report:
-
Show summary to user: overall score, top 3 failure patterns, top 3 recommendations
-
For each suggested patch (if any), use:
AskUserQuestion- Question: "Apply patch to {file}? {one-line description}"
- Options: "Apply" / "Skip" / "Show diff first"
-
If "Show diff first": display the full diff, then ask again (Apply / Skip)
-
If "Apply": use Edit tool to apply the change directly
-
After all patches processed:
- If any patches were applied: suggest committing with
fix(retro): {description} - Do NOT auto-commit — just suggest the command
- If any patches were applied: suggest committing with
生成报告后:
-
向用户展示摘要:总体评分、前3个失败模式、前3个建议
-
针对每个建议补丁(若有),使用:
AskUserQuestion- 问题:“是否将补丁应用到 {file}?{一行描述}”
- 选项:“应用” / “跳过” / “先显示差异”
-
若选择“先显示差异”:显示完整差异内容,然后再次询问(应用 / 跳过)
-
若选择“应用”:使用编辑工具直接应用变更
-
处理完所有补丁后:
- 若应用了任何补丁:建议使用提交变更
fix(retro): {描述} - 请勿自动提交——仅建议命令
- 若应用了任何补丁:建议使用
Phase 9: CLAUDE.md Revision
阶段9:修订CLAUDE.md
After patching, revise the project's CLAUDE.md to keep it lean and useful for future agents.
补丁应用完成后,修订项目的CLAUDE.md,使其保持精简,便于未来Agent使用。
Steps:
步骤:
- Read CLAUDE.md and check size:
wc -c CLAUDE.md - Add learnings from this retro:
- Pipeline failure patterns worth remembering (avoid next time)
- New workflow rules or process improvements
- Updated commands or tooling changes
- Architecture decisions that emerged during the pipeline run
- If over 40,000 characters — trim ruthlessly:
- Collapse completed phase/milestone histories into one line each
- Remove verbose explanations — keep terse, actionable notes
- Remove duplicate info (same thing explained in multiple sections)
- Remove historical migration notes, old debugging context
- Remove examples that are obvious from code or covered by skill/doc files
- Remove outdated troubleshooting for resolved issues
- Verify result ≤ 40,000 characters — if still over, cut least actionable content
- Write updated CLAUDE.md, update "Last updated" date
- 读取CLAUDE.md并检查大小:
wc -c CLAUDE.md - 添加本次回顾的经验总结:
- 值得记住的流水线失败模式(避免下次出现)
- 新的工作流规则或流程改进
- 更新后的命令或工具变更
- 流水线运行过程中形成的架构决策
- 若文件大小超过40,000字符——严格精简:
- 将已完成的阶段/里程碑历史压缩为每行一个阶段
- 删除冗长的解释——保留简洁、可操作的说明
- 删除重复信息(同一内容在多个章节中解释)
- 删除历史迁移说明、旧调试上下文
- 删除代码中已明确或Skill/文档文件已覆盖的示例
- 删除已解决问题的过时故障排除内容
- 验证最终大小≤40,000字符——若仍超出,删除最不具操作性的内容
- 写入更新后的CLAUDE.md,更新“最后更新”日期
Priority (keep → cut):
优先级(保留→删除):
- ALWAYS KEEP: Tech stack, directory structure, Do/Don't rules, common commands, architecture decisions
- KEEP: Workflow instructions, troubleshooting for active issues, key file references
- CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
- CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues
- 必须保留:技术栈、目录结构、Do/Don't规则、常用命令、架构决策
- 建议保留:工作流说明、当前问题的故障排除、关键文件引用
- 可压缩:阶段历史(每行一个)、详细示例、工具/MCP列表
- 优先删除:历史记录、冗长解释、重复内容、已解决问题
Rules:
规则:
- Never remove Do/Don't sections — critical guardrails
- Preserve overall section structure and ordering
- Every line must earn its place: "would a future agent need this to do their job?"
- Commit the update:
git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"
- 切勿删除Do/Don't章节——这是关键的防护规则
- 保留整体章节结构和顺序
- 每一行内容都必须有存在的价值:“未来Agent完成工作是否需要这部分内容?”
- 提交更新:
git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"
Phase 10: Factory Critic
阶段10:工厂评审
After evaluating the project pipeline, step back and evaluate the factory itself — the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.
完成项目流水线评估后,退一步评估工厂本身——即生成该结果的Skill、脚本和流水线逻辑。请做一个严苛的评审者。
What to evaluate:
评估内容:
-
Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):
- For each skill:
${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md - Did the skill have the right instructions for this project's needs?
- Did it miss context it should have had?
- For each skill:
-
Read solo-dev.sh signal handling and stage logic:
${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh- Were there structural issues (wrong stage order, missing re-exec, broken redo)?
-
Cross-reference with failure patterns from Phase 3:
- For each failure: was the root cause in the skill, the script, or the project?
- Skills that caused waste = factory defects
-
读取本次流水线运行中调用的Skill(来自流水线日志中的INVOKE行):
- 每个Skill的路径:
${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md - 该Skill是否具备符合项目需求的正确指令?
- 它是否缺失了本应具备的上下文信息?
- 每个Skill的路径:
-
读取solo-dev.sh的信号处理和阶段逻辑:
- 路径:
${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh - 是否存在结构性问题(阶段顺序错误、缺失重新执行、重做功能损坏)?
- 路径:
-
与阶段3的失败模式交叉验证:
- 对于每个失败:根本原因是在Skill中、脚本中还是项目本身?
- 导致浪费的Skill=工厂缺陷
Score the factory (not the project):
为工厂评分(而非项目):
Factory Score: {N}/10
Skill quality:
- {skill}: {score}/10 — {why}
- {skill}: {score}/10 — {why}
Pipeline reliability: {N}/10 — {why}
Missing capabilities:
- {what the factory couldn't do that it should have}
Top factory defects:
1. {defect} → {which file to fix} → {concrete fix}
2. {defect} → {which file to fix} → {concrete fix}工厂评分:{N}/10
Skill质量:
- {skill}: {score}/10 — {原因}
- {skill}: {score}/10 — {原因}
流水线可靠性:{N}/10 — {原因}
缺失的能力:
- {工厂本应具备但缺失的能力}
主要工厂缺陷:
1. {缺陷描述} → {需修复的文件} → {具体修复方案}
2. {缺陷描述} → {需修复的文件} → {具体修复方案}Harness Evolution — think about the bigger picture
Harness演进——从更宏观的角度思考
After scoring the factory, step back further and think about the harness — the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:
-
Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?
- Missing docs → add to or CLAUDE.md
docs/ - Stale docs → flag for doc-gardening
- Knowledge only in your head → encode it
- Missing docs → add to
-
Architectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?
- Repeated boundary violations → need a linter or structural test
- Inconsistent patterns → need golden principle in CLAUDE.md
- Data shape errors → need parse-at-boundary enforcement
-
Decision traces: What worked well that future agents should reuse? What failed that they should avoid?
- Good patterns → capture as precedent in KB or CLAUDE.md
- Bad patterns → encode as anti-pattern or lint rule
- Think: "if another agent hits this same problem tomorrow, what should it find?"
-
Skill gaps: Which skills need better instructions? Which new skills should exist?
- Skill that caused waste → concrete SKILL.md patch
- Missing capability → new skill idea for evolution.md
Append harness findings to the evolution log alongside factory defects.
完成工厂评分后,再退一步思考Harness——即指导Agent工作的整个系统(CLAUDE.md、docs/、代码检查器、Skill、模板)。请思考:
-
上下文工程:Agent是否在仓库中获得了所需的所有信息?还是因为知识缺失/分散/陈旧而遇到困难?
- 缺失的文档→添加到或CLAUDE.md
docs/ - 陈旧的文档→标记为需要文档维护
- 仅存在于你脑海中的知识→将其编码到文档中
- 缺失的文档→添加到
-
架构约束:Agent是否违反了模块边界、产生了不一致的模式或忽略了约定?
- 重复的边界违规→需要代码检查器或结构测试
- 不一致的模式→需要在CLAUDE.md中添加黄金原则
- 数据格式错误→需要在边界处强制执行解析检查
-
决策轨迹:哪些内容运行良好,值得未来Agent复用?哪些内容失败,应避免?
- 良好模式→捕获为知识库或CLAUDE.md中的先例
- 不良模式→编码为反模式或代码检查规则
- 思考:“如果另一个Agent明天遇到同样的问题,它应该能找到什么解决方案?”
-
Skill缺口:哪些Skill需要更好的指令?哪些新Skill应该存在?
- 导致浪费的Skill→具体的SKILL.md补丁
- 缺失的能力→在evolution.md中记录新Skill的想法
将Harness的发现与工厂缺陷一起添加到演进日志中。
Write to evolution log:
写入演进日志:
Append findings to (create if not exists):
~/.solo/evolution.mdmarkdown
undefined将发现内容追加到(若不存在则创建):
~/.solo/evolution.mdmarkdown
undefined{YYYY-MM-DD} | {project} | Factory Score: {N}/10
{YYYY-MM-DD} | {project} | 工厂评分:{N}/10
Pipeline: {stages run} | Iters: {total} | Waste: {pct}%
流水线:{运行的阶段} | 迭代次数:{总次数} | 浪费率:{pct}%
Defects
缺陷
- {severity} | {skill/script}: {description}
- Fix: {concrete file:change}
- {严重程度} | {skill/脚本}:{描述}
- 修复方案:{具体的文件:变更内容}
Harness Gaps
Harness缺口
- Context: {what knowledge was missing or stale for the agent}
- Constraints: {what boundary violations or inconsistencies occurred}
- Precedents: {patterns worth capturing for future agents — good or bad}
- 上下文:{Agent缺失或陈旧的知识内容}
- 约束:{发生的边界违规或不一致问题}
- 先例:{值得为未来Agent记录的模式——无论好坏}
Missing
缺失的能力
- {capability the factory lacked}
- {工厂缺少的能力}
What worked well
运行良好的内容
- {skill/pattern that performed efficiently}
**Rules:**
- Be brutally honest — if a skill is broken, say so
- Every defect must have a concrete fix (file + what to change)
- Track what works well too — don't regress good patterns
- Keep entries compact — this file accumulates over time- {高效运行的Skill/模式}
**规则**:
- 请绝对诚实——如果某个Skill存在问题,直接指出
- 每个缺陷都必须有具体的修复方案(文件+变更内容)
- 也要记录运行良好的内容——不要让良好的模式退化
- 保持条目简洁——该文件会随时间积累内容Signal Output
信号输出
Output signal:
<solo:done/>Important: always outputs — it never needs redo. Even if pipeline was terrible, the retro itself always completes.
/retro<solo:done/>输出信号:
<solo:done/>重要说明:始终输出——它永远不需要重做。即使流水线运行状况极差,回顾分析本身也总能完成。
/retro<solo:done/>Edge Cases
边缘情况
- No pipeline.log: abort with clear message — "No pipeline log found at {path}. Run a pipeline first."
- Empty pipeline.log: report "Pipeline log is empty — was the pipeline cancelled before any iteration?"
- No iter logs: skip Phase 4 sampling, note in report
- No plan-done: skip Phase 5, note "No completed plans found"
- No test/build commands: skip those checks in Phase 6, note in report
- Pipeline still running: warn user — "State file exists, pipeline may still be running. Retro on partial data."
- 无pipeline.log:终止操作并显示清晰消息——“在{路径}未找到流水线日志。请先运行流水线。”
- 空pipeline.log:报告“流水线日志为空——流水线是否在执行任何迭代前就被取消了?”
- 无迭代日志:跳过阶段4的抽样,在报告中记录
- 无plan-done目录:跳过阶段5,记录“未找到已完成的计划”
- 无测试/构建命令:跳过阶段6中的这些检查,在报告中记录
- 流水线仍在运行:向用户发出警告——“状态文件存在,流水线可能仍在运行。本次回顾基于部分数据。”
Reference Files
参考文件
- — scoring rubric (8 axes, weights)
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md - — known failure patterns and fixes
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md
- — 评分规则(8个维度,带权重)
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md - — 已知失败模式及修复方案
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md