rlm-debugging
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRLM Systematic Debugging (Phase 1.5)
RLM系统化调试(Phase 1.5)
Overview
概述
When a requirement involves fixing a bug or investigating unexpected behavior, ad-hoc fixes waste time and create new bugs. Systematic debugging finds the root cause before any fix is attempted.
Core Principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
The Iron Law for RLM Debugging:
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST当需求涉及修复bug或调查异常行为时,临时修复不仅浪费时间,还会引入新的bug。系统化调试要求在尝试任何修复之前先找到问题的根本原因。
核心原则: 务必在尝试修复前找到根本原因。仅修复症状等同于失败。
RLM调试铁律:
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRSTWhen to Use
适用场景
Mandatory Phase 1.5 when:
- Requirement is a bug fix
- Test failures need investigation
- Unexpected behavior reported
- Performance problems
- Integration issues
Use ESPECIALLY when:
- Under time pressure (emergencies make guessing tempting)
- "Just one quick fix" seems obvious
- Previous fix attempts failed
- You don't fully understand the issue
Don't skip when:
- Issue seems simple (simple bugs have root causes too)
- You're in a hurry (rushing guarantees rework)
- Manager wants it fixed NOW (systematic is faster than thrashing)
必须执行Phase 1.5的情况:
- 需求为bug修复
- 需要调查测试失败问题
- 上报了异常行为
- 性能问题
- 集成问题
尤其需要执行的情况:
- 处于时间压力下(紧急情况容易让人凭猜测行事)
- “只需快速修复一下”看起来很简单
- 之前的修复尝试失败
- 你并未完全理解问题
不可跳过的情况:
- 问题看似简单(简单bug也有根本原因)
- 你很赶时间(仓促行事必然导致返工)
- 经理要求立即修复(系统化方法比盲目尝试更快)
Phase 1.5 Insertion
Phase 1.5的插入位置
Phase 1.5 is inserted between Phase 1 (AS-IS) and Phase 2 (TO-BE Plan) when debugging is required:
Phase 0: 00-requirements.md
↓
Phase 1: 01-as-is.md (captures current behavior)
↓
Phase 1.5: 01.5-root-cause.md ← NEW (this skill)
↓
Phase 2: 02-to-be-plan.md (includes fix plan based on root cause)当需要调试时,将Phase 1.5插入到Phase 1(现状分析)和Phase 2(目标方案规划)之间:
Phase 0: 00-requirements.md
↓
Phase 1: 01-as-is.md(记录当前行为)
↓
Phase 1.5: 01.5-root-cause.md ← 新增(本技能对应的环节)
↓
Phase 2: 02-to-be-plan.md(包含基于根因分析的修复方案)The Four Phases of Systematic Debugging
系统化调试的四个阶段
dot
digraph debugging_phases {
rankdir=TB;
phase1 [label="Step 1:\nRoot Cause Investigation", shape=box, style=filled, fillcolor="#ffcccc"];
phase2 [label="Step 2:\nPattern Analysis", shape=box, style=filled, fillcolor="#ffffcc"];
phase3 [label="Step 3:\nHypothesis & Testing", shape=box, style=filled, fillcolor="#ccffcc"];
phase4 [label="Step 4:\nFix Implementation", shape=box, style=filled, fillcolor="#ccccff"];
phase1 -> phase2 -> phase3 -> phase4;
// Feedback loops
phase3 -> phase1 [label="hypothesis\nfailed", style=dashed];
phase4 -> phase1 [label="fix\nfailed", style=dashed];
}dot
digraph debugging_phases {
rankdir=TB;
phase1 [label="Step 1:\nRoot Cause Investigation", shape=box, style=filled, fillcolor="#ffcccc"];
phase2 [label="Step 2:\nPattern Analysis", shape=box, style=filled, fillcolor="#ffffcc"];
phase3 [label="Step 3:\nHypothesis & Testing", shape=box, style=filled, fillcolor="#ccffcc"];
phase4 [label="Step 4:\nFix Implementation", shape=box, style=filled, fillcolor="#ccccff"];
phase1 -> phase2 -> phase3 -> phase4;
// Feedback loops
phase3 -> phase1 [label="hypothesis\nfailed", style=dashed];
phase4 -> phase1 [label="fix\nfailed", style=dashed];
}Step 1: Root Cause Investigation
步骤1:根因调查
BEFORE attempting ANY fix:
在尝试任何修复之前:
1.1 Read Error Messages Carefully
1.1 仔细阅读错误信息
- Don't skip past errors or warnings
- They often contain the exact solution
- Read stack traces completely
- Note line numbers, file paths, error codes
Record in Phase 1.5 artifact:
markdown
undefined- 不要跳过错误或警告
- 它们通常包含确切的解决方案
- 完整阅读堆栈跟踪
- 记录行号、文件路径、错误代码
在Phase 1.5文档中记录:
markdown
undefinedError Analysis
错误分析
Error Message: [verbatim]
Stack Trace: [key frames]
File:Line: [locations]
Error Code: [if applicable]
Key Insight: [what the error is telling you]
undefined错误信息: [原文]
堆栈跟踪: [关键帧]
文件:行号: [位置]
错误代码:(如适用)
关键洞察: [错误信息传达的内容]
undefined1.2 Reproduce Consistently
1.2 稳定复现问题
- Can you trigger it reliably?
- What are the exact steps?
- Does it happen every time?
- If not reproducible → gather more data, don't guess
Record in Phase 1.5 artifact:
markdown
undefined- 你能否可靠地触发问题?
- 确切步骤是什么?
- 每次都会发生吗?
- 如果无法复现 → 收集更多数据,不要猜测
在Phase 1.5文档中记录:
markdown
undefinedReproduction Verification
复现验证
Steps:
- [exact step]
- [exact step]
- [exact step]
Reproducible: Yes / No / Intermittent
Frequency: [X out of Y attempts]
Deterministic: Yes / No
undefined步骤:
- [确切步骤]
- [确切步骤]
- [确切步骤]
可复现性: 是/否/间歇性
频率: [Y次尝试中出现X次]
确定性: 是/否
undefined1.3 Check Recent Changes
1.3 检查近期变更
- What changed that could cause this?
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
Record in Phase 1.5 artifact:
markdown
undefined- 哪些变更可能导致了这个问题?
- Git差异、近期提交
- 新依赖、配置变更
- 环境差异
在Phase 1.5文档中记录:
markdown
undefinedRecent Changes Analysis
近期变更分析
Git History: [relevant commits]
Dependency Changes: [package.json, requirements.txt, etc.]
Config Changes: [relevant files]
Environment: [OS, runtime versions]
Likely Culprit: [most suspicious change]
undefinedGit历史: [相关提交]
依赖变更: [package.json、requirements.txt等]
配置变更: [相关文件]
环境: [操作系统、运行时版本]
可疑原因: [最可疑的变更]
undefined1.4 Gather Evidence in Multi-Component Systems
1.4 在多组件系统中收集证据
WHEN system has multiple components (CI → build → signing, API → service → database):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary:
- Log what data enters component
- Log what data exits component
- Verify environment/config propagation
- Check state at each layer
Run once to gather evidence showing WHERE it breaks, THEN analyze evidence.
Record in Phase 1.5 artifact:
markdown
undefined当系统包含多个组件时(CI → 构建 → 签名,API → 服务 → 数据库):
在提出修复方案之前,添加诊断工具:
对于每个组件边界:
- 记录进入组件的数据
- 记录离开组件的数据
- 验证环境/配置的传递
- 检查每个层级的状态
运行一次以收集证据,确定问题出在哪里,然后再分析证据。
在Phase 1.5文档中记录:
markdown
undefinedMulti-Layer Evidence
多层证据
Layer 1: [Component Name]
- Input: [data]
- Output: [data]
- Status: ✅ Working / ❌ Broken
Layer 2: [Component Name]
- Input: [data from Layer 1]
- Output: [data]
- Status: ✅ Working / ❌ Broken
Failure Boundary: Layer X → Layer Y
Root Cause Location: [specific component]
undefined层级1:[组件名称]
- 输入:[数据]
- 输出:[数据]
- 状态:✅ 正常 / ❌ 异常
层级2:[组件名称]
- 输入:[来自层级1的数据]
- 输出:[数据]
- 状态:✅ 正常 / ❌ 异常
故障边界: 层级X → 层级Y
根因位置: [具体组件]
undefined1.5 Trace Data Flow
1.5 跟踪数据流
WHEN error is deep in call stack:
Trace backward:
- Where does bad value originate?
- What called this with bad value?
- Keep tracing up until you find the source
- Fix at source, not at symptom
Record in Phase 1.5 artifact:
markdown
undefined当错误位于调用栈深处时:
反向跟踪:
- 错误值起源于哪里?
- 是谁传递了错误值?
- 持续向上跟踪直到找到源头
- 修复源头,而非症状
在Phase 1.5文档中记录:
markdown
undefinedData Flow Trace
数据流跟踪
Error Location: [file:line - function]
Bad Value: [what was wrong]
Call Stack Trace:
- [deepest] at fileA:line - received [value]
functionA() - at fileB:line - passed [value]
functionB() - at fileC:line - passed [value]
functionC() - [source] at fileD:line - ORIGIN of bad value
functionD()
Root Cause: [source location] - [explanation]
undefined错误位置: [文件:行号 - 函数]
错误值: [具体问题]
调用栈跟踪:
- [最底层] 在fileA:行号 - 接收了[值]
functionA() - 在fileB:行号 - 传递了[值]
functionB() - 在fileC:行号 - 传递了[值]
functionC() - [源头] 在fileD:行号 - 错误值的起源地
functionD()
根因: [源头位置] - [说明]
undefinedStep 2: Pattern Analysis
步骤2:模式分析
Find the pattern before fixing:
修复前先找到模式:
2.1 Find Working Examples
2.1 找到可正常运行的示例
- Locate similar working code in same codebase
- What works that's similar to what's broken?
- 在同一代码库中定位类似的可正常运行的代码
- 哪些部分和故障代码相似但能正常工作?
2.2 Compare Against References
2.2 与参考实现对比
- If implementing pattern, read reference implementation COMPLETELY
- Don't skim - read every line
- Understand the pattern fully before applying
- 如果是实现某种模式,请完整阅读参考实现
- 不要略读 - 逐行阅读
- 在应用前完全理解该模式
2.3 Identify Differences
2.3 识别差异
- What's different between working and broken?
- List every difference, however small
- Don't assume "that can't matter"
- 正常代码和故障代码之间有什么不同?
- 列出所有差异,无论多小
- 不要假设“这无关紧要”
2.4 Understand Dependencies
2.4 理解依赖关系
- What other components does this need?
- What settings, config, environment?
- What assumptions does it make?
Record in Phase 1.5 artifact:
markdown
undefined- 该代码还需要哪些其他组件?
- 需要哪些设置、配置、环境?
- 它有哪些假设条件?
在Phase 1.5文档中记录:
markdown
undefinedPattern Analysis
模式分析
Working Example: [file:location]
Broken Code: [file:location]
Key Differences:
| Aspect | Working | Broken |
|---|---|---|
| [X] | [value] | [value] |
| [Y] | [value] | [value] |
Likely Cause: [difference that explains the bug]
Dependencies: [what the code needs to work]
undefined正常示例: [文件:位置]
故障代码: [文件:位置]
关键差异:
| 方面 | 正常 | 故障 |
|---|---|---|
| [X] | [值] | [值] |
| [Y] | [值] | [值] |
可能原因: [能解释bug的差异]
依赖关系: [代码正常运行所需的条件]
undefinedStep 3: Hypothesis and Testing
步骤3:假设与测试
Scientific method:
采用科学方法:
3.1 Form Single Hypothesis
3.1 形成单一假设
- State clearly: "I think X is the root cause because Y"
- Write it down
- Be specific, not vague
- 清晰陈述:“我认为X是根因,因为Y”
- 写下来
- 要具体,不要模糊
3.2 Test Minimally
3.2 最小化测试
- Make the SMALLEST possible change to test hypothesis
- One variable at a time
- Don't fix multiple things at once
- 做出最小的变更来验证假设
- 一次只改变一个变量
- 不要同时修复多个问题
3.3 Verify Before Continuing
3.3 验证后再继续
- Did it work? Yes → Phase 4
- Didn't work? Form NEW hypothesis
- DON'T add more fixes on top
- 有效?是 → 进入步骤4
- 无效?形成新的假设
- 不要在已有修复上叠加更多修复
3.4 When You Don't Know
3.4 当你不确定时
- Say "I don't understand X"
- Don't pretend to know
- Ask for help
- Research more
Record in Phase 1.5 artifact:
markdown
undefined- 说“我不理解X”
- 不要假装知道
- 寻求帮助
- 深入研究
在Phase 1.5文档中记录:
markdown
undefinedHypothesis Testing
假设测试
Hypothesis 1
假设1
Statement: [clear hypothesis]
Rationale: [why you think this]
Test: [minimal change to verify]
Result: [confirmed/rejected]
Evidence: [output/observation]
陈述: [清晰的假设]
理由: [你这么认为的原因]
测试: [用于验证的最小变更]
结果: [确认/拒绝]
证据: [输出/观察结果]
Hypothesis 2 (if needed)
假设2(如需要)
[...]
Confirmed Root Cause: [final hypothesis]
undefined[...]
已确认的根因: [最终假设]
undefinedStep 4: Fix Summary (Handoff to Phase 2 Planning)
步骤4:修复总结(移交至Phase 2规划)
Fix the root cause, not the symptom:
修复根因,而非症状:
4.1 Create Failing Test Case
4.1 创建失败测试用例
- Simplest possible reproduction
- Automated test if possible
- One-off test script if no framework
- MUST have before fixing
- This becomes part of Phase 3's test plan
- 最简单的复现方式
- 尽可能实现自动化测试
- 如果没有框架,编写一次性测试脚本
- 修复前必须完成
- 这将成为Phase 3测试计划的一部分
4.2 Root Cause Summary for Phase 2
4.2 为Phase 2准备的根因总结
- Summarize root cause found
- Document the fix approach
- Reference evidence from Phase 1.5
Record in Phase 1.5 artifact:
markdown
undefined- 总结已找到的根因
- 记录修复方法
- 引用Phase 1.5中的证据
在Phase 1.5文档中记录:
markdown
undefinedRoot Cause Summary
根因总结
Root Cause: [one sentence]
Location: [file:line]
Explanation: [paragraph explaining why]
Fix Approach: [high-level]
Test Strategy: [how to verify fix]
根因: [一句话总结]
位置: [文件:行号]
说明: [解释原因的段落]
修复方法: [高层级方案]
测试策略: [如何验证修复]
Phase 1.5 Gate
Phase 1.5 准入门槛
Coverage: [Did we find root cause?]
Approval: [Ready to proceed to Phase 3 with fix plan?]
undefined覆盖情况:[我们是否找到根因?]
审批:[是否准备好带着修复方案进入Phase 3?]
undefinedRed Flags - STOP and Follow Process
危险信号 - 停止并遵循流程
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "Add multiple changes, run tests"
- "Skip the test, I'll manually verify"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- Proposing solutions before tracing data flow
- "One more fix attempt" (when already tried 2+)
ALL of these mean: STOP. Return to Phase 2.
如果你发现自己有以下想法:
- “先临时修复,之后再调查”
- “试试改X看看行不行”
- “做多个变更,然后运行测试”
- “跳过测试,我手动验证”
- “可能是X,我来修复它”
- “我不完全理解,但这个可能有用”
- 在跟踪数据流之前就提出解决方案
- “再试一次修复”(已经尝试2次以上)
以上所有情况都意味着:停止。回到Phase 2。
Common Rationalizations (STOP)
常见借口(停止)
| Excuse | Reality |
|---|---|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
| "I'll write test after confirming fix" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
| 借口 | 现实 |
|---|---|
| “问题很简单,不需要流程” | 简单问题也有根因。流程处理简单bug速度很快。 |
| “紧急情况,没时间走流程” | 系统化调试比盲目尝试更快。 |
| “先试试这个,之后再调查” | 第一次修复会定下模式。从一开始就做对。 |
| “确认修复后再写测试” | 未测试的修复无法持久。先写测试能验证问题。 |
| “同时修复多个问题节省时间” | 无法确定哪个变更起作用。会引入新bug。 |
| “我看到问题了,我来修复” | 看到症状≠理解根因。 |
| “再试一次修复”(失败2次以上) | 3次以上失败意味着架构问题。质疑模式,不要继续修复。 |
If 3+ Fix Attempts Failed
当3次以上修复尝试失败时
Pattern indicating architectural problem:
- Each fix reveals new shared state/coupling/problem in different place
- Fixes require "massive refactoring" to implement
- Each fix creates new symptoms elsewhere
STOP and question fundamentals:
- Is this pattern fundamentally sound?
- Are we "sticking with it through sheer inertia"?
- Should we refactor architecture vs. continue fixing symptoms?
Document in Phase 1.5:
markdown
undefined表明存在架构问题的模式:
- 每次修复都会在不同地方暴露出新的共享状态/耦合/问题
- 修复需要“大规模重构”才能实现
- 每次修复都会在其他地方引发新的症状
停止并质疑基础:
- 这个模式从根本上合理吗?
- 我们是不是“因惯性而坚持”?
- 我们应该重构架构还是继续修复症状?
在Phase 1.5中记录:
markdown
undefinedArchitectural Concern
架构问题
Fix Attempts: [number]
Pattern: [what happens with each fix]
Recommendation: [architectural change vs. symptom fix]
Next Steps: [escalate, refactor, or accept risk]
undefined修复尝试次数: [次数]
模式: [每次修复后的情况]
建议: [架构变更 vs 症状修复]
下一步: [上报、重构或接受风险]
undefinedPhase 1.5 Artifact Template
Phase 1.5 文档模板
File:
/.codex/rlm/<run-id>/01.5-root-cause.mdmarkdown
Run: `/.codex/rlm/<run-id>/`
Phase: `01.5 Root Cause Analysis`
Status: `DRAFT` | `LOCKED`
Inputs:
- `/.codex/rlm/<run-id>/01-as-is.md`
- [relevant addenda]
Outputs:
- `/.codex/rlm/<run-id>/01.5-root-cause.md`
Scope note: This document records systematic debugging process and identified root cause.文件:
/.codex/rlm/<run-id>/01.5-root-cause.mdmarkdown
Run: `/.codex/rlm/<run-id>/`
Phase: `01.5 Root Cause Analysis`
Status: `DRAFT` | `LOCKED`
Inputs:
- `/.codex/rlm/<run-id>/01-as-is.md`
- [相关附录]
Outputs:
- `/.codex/rlm/<run-id>/01.5-root-cause.md`
Scope note: This document records systematic debugging process and identified root cause.Error Analysis
Error Analysis
[Section 2.1 - verbatim errors, stack traces]
[Section 2.1 - verbatim errors, stack traces]
Reproduction Verification
Reproduction Verification
[Section 2.2 - exact steps, reproducibility]
[Section 2.2 - exact steps, reproducibility]
Recent Changes Analysis
Recent Changes Analysis
[Section 2.3 - git history, dependencies]
[Section 2.3 - git history, dependencies]
Evidence Gathering
Evidence Gathering
[Section 2.4 - multi-layer diagnostics if applicable]
[Section 2.4 - multi-layer diagnostics if applicable]
Data Flow Trace
Data Flow Trace
[Section 2.5 - backward trace to source]
[Section 2.5 - backward trace to source]
Pattern Analysis
Pattern Analysis
[Section 3 - working vs broken comparison]
[Section 3 - working vs broken comparison]
Hypothesis Testing
Hypothesis Testing
[Section 4 - scientific method log]
[Section 4 - scientific method log]
Root Cause Summary
Root Cause Summary
Root Cause: [one sentence]
Location: [file:line]
Detailed Explanation: [paragraph]
Fix Strategy: [approach for Phase 3]
Test Plan: [how to verify]
Root Cause: [one sentence]
Location: [file:line]
Detailed Explanation: [paragraph]
Fix Strategy: [approach for Phase 3]
Test Plan: [how to verify]
Traceability
Traceability
- R1 (Bug fix requirement) -> Root cause identified at [location] | Evidence: [section]
- R1 (Bug fix requirement) -> Root cause identified at [location] | Evidence: [section]
Coverage Gate
Coverage Gate
- Error messages analyzed
- Reproduction verified
- Recent changes reviewed
- Data flow traced to source
- Pattern analysis completed
- Hypothesis tested and confirmed
- Root cause documented
- Fix strategy defined
Coverage: PASS / FAIL
- Error messages analyzed
- Reproduction verified
- Recent changes reviewed
- Data flow traced to source
- Pattern analysis completed
- Hypothesis tested and confirmed
- Root cause documented
- Fix strategy defined
Coverage: PASS / FAIL
Approval Gate
Approval Gate
- Root cause identified (not just symptom)
- Fix approach clear
- Test strategy defined
- No "quick fixes" attempted
- Ready to proceed to Phase 3
Approval: PASS / FAIL
LockedAt: [when locked]
LockHash: [sha256]
undefined- Root cause identified (not just symptom)
- Fix approach clear
- Test strategy defined
- No "quick fixes" attempted
- Ready to proceed to Phase 3
Approval: PASS / FAIL
LockedAt: [when locked]
LockHash: [sha256]
undefinedIntegration with RLM
与RLM的集成
Phase 1 → 1.5 Transition
Phase 1 → 1.5 过渡
When Phase 1 (AS-IS) identifies a bug/issue that needs fixing:
- Lock Phase 1 ()
01-as-is.md - Create Phase 1.5 () with Status: DRAFT
01.5-root-cause.md - Execute systematic debugging
- Lock Phase 1.5 when root cause found
- Proceed to Phase 3 with root cause knowledge
当Phase 1(现状分析)识别出需要修复的bug/问题时:
- 锁定Phase 1()
01-as-is.md - 创建Phase 1.5(),状态设为DRAFT
01.5-root-cause.md - 执行系统化调试
- 找到根因后锁定Phase 1.5
- 带着根因相关知识进入Phase 3
Phase 1.5 → 3 Transition
Phase 1.5 → 3 过渡
Phase 3 () builds ON Phase 1.5:
02-to-be-plan.mdmarkdown
undefinedPhase 3()基于Phase 1.5构建:
02-to-be-plan.mdmarkdown
undefinedRoot Cause Reference
Root Cause Reference
Root cause identified in :
01.5-root-cause.md- Location: [file:line]
- Cause: [summary]
- Full analysis: [reference]
Root cause identified in :
01.5-root-cause.md- Location: [file:line]
- Cause: [summary]
- Full analysis: [reference]
Fix Plan
Fix Plan
Based on root cause analysis:
- [specific fix steps]
- [test strategy from Phase 1.5]
undefinedBased on root cause analysis:
- [specific fix steps]
- [test strategy from Phase 1.5]
undefinedReferences
参考资料
- REQUIRED: Use this skill for all bug-fix requirements
- TRIGGERS: When requirement involves debugging/fixing
- OUTPUT: artifact
01.5-root-cause.md - NEXT: Phase 3 (TO-BE Plan) incorporates findings
- 必须: 所有bug修复需求都使用本技能
- 触发条件: 当需求涉及调试/修复时
- 输出: 文档
01.5-root-cause.md - 下一步: Phase 3(目标方案规划)整合分析结果