systematic-debugging
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSystematic Debugging
系统化调试
Core Principle
核心原则
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
Never apply symptom-focused patches that mask underlying problems. Understand WHY something fails before attempting to fix it.
未完成根本原因调查绝不修复。
切勿采用仅针对症状的补丁来掩盖潜在问题。在尝试修复之前,务必先理解问题失败的原因。
The Four-Phase Framework
四阶段框架
Phase 1: Root Cause Investigation
阶段1:根本原因调查
Before touching any code:
- Read error messages thoroughly - Every word matters
- Reproduce the issue consistently - If you can't reproduce it, you can't verify a fix
- Examine recent changes - What changed before this started failing?
- Gather diagnostic evidence - Logs, stack traces, state dumps
- Trace data flow - Follow the call chain to find where bad values originate
Root Cause Tracing Technique:
1. Observe the symptom - Where does the error manifest?
2. Find immediate cause - Which code directly produces the error?
3. Ask "What called this?" - Map the call chain upward
4. Keep tracing up - Follow invalid data backward through the stack
5. Find original trigger - Where did the problem actually start?Key principle: Never fix problems solely where errors appear—always trace to the original trigger.
在修改任何代码之前:
- 仔细阅读错误信息 - 每一个字都很重要
- 稳定复现问题 - 无法复现就无法验证修复效果
- 检查近期变更 - 问题出现前有哪些修改?
- 收集诊断证据 - 日志、堆栈跟踪、状态转储
- 追踪数据流 - 沿着调用链查找错误值的来源
根本原因追踪技巧:
1. Observe the symptom - Where does the error manifest?
2. Find immediate cause - Which code directly produces the error?
3. Ask "What called this?" - Map the call chain upward
4. Keep tracing up - Follow invalid data backward through the stack
5. Find original trigger - Where did the problem actually start?关键原则: 绝不要仅在错误出现的位置修复问题——一定要追踪到最初的触发点。
Phase 2: Pattern Analysis
阶段2:模式分析
- Locate working examples - Find similar code that works correctly
- Compare implementations completely - Don't just skim
- Identify differences - What's different between working and broken?
- Understand dependencies - What does this code depend on?
- 定位可用示例 - 找到功能正常的相似代码
- 全面对比实现 - 不要只是略读
- 识别差异 - 正常代码和故障代码之间有什么不同?
- 理解依赖关系 - 这段代码依赖哪些内容?
Phase 3: Hypothesis and Testing
阶段3:假设与测试
Apply the scientific method:
- Formulate ONE clear hypothesis - "The error occurs because X"
- Design minimal test - Change ONE variable at a time
- Predict the outcome - What should happen if hypothesis is correct?
- Run the test - Execute and observe
- Verify results - Did it behave as predicted?
- Iterate or proceed - Refine hypothesis if wrong, implement if right
应用科学方法:
- 提出一个明确的假设 - “错误出现是因为X”
- 设计最小化测试 - 每次只改变一个变量
- 预测结果 - 如果假设正确,会出现什么情况?
- 运行测试 - 执行并观察结果
- 验证结果 - 是否符合预测?
- 迭代或推进 - 如果错误则优化假设,如果正确则实施修复
Phase 4: Implementation
阶段4:实施修复
- Create failing test case - Captures the bug behavior
- Implement single fix - Address root cause, not symptoms
- Verify test passes - Confirms fix works
- Run full test suite - Ensure no regressions
- If fix fails, STOP - Re-evaluate hypothesis
Critical rule: If THREE or more fixes fail consecutively, STOP. This signals architectural problems requiring discussion, not more patches.
- 创建失败测试用例 - 记录bug的行为
- 实施单一修复 - 针对根本原因,而非症状
- 验证测试通过 - 确认修复有效
- 运行完整测试套件 - 确保没有回归问题
- 如果修复失败,立即停止 - 重新评估假设
关键规则: 如果连续三次或更多次修复失败,立即停止。这表明存在架构问题,需要讨论而非继续打补丁。
Red Flags - Process Violations
危险信号 - 违反流程
Stop immediately if you catch yourself thinking:
- "Quick fix for now, investigate later"
- "One more fix attempt" (after multiple failures)
- "This should work" (without understanding why)
- "Let me just try..." (without hypothesis)
- "It works on my machine" (without investigating difference)
如果你发现自己有以下想法,请立即停止:
- “先快速修复,之后再调查”
- “再试一次修复”(多次失败后)
- “这样应该能行”(不理解原因的情况下)
- “我只是试试...”(没有假设的情况下)
- “在我机器上是好的”(不调查差异的情况下)
Warning Signs of Deeper Problems
深层问题的预警信号
Consecutive fixes revealing new problems in different areas indicates architectural issues:
- Stop patching
- Document what you've found
- Discuss with team before proceeding
- Consider if the design needs rethinking
连续修复后在不同区域出现新问题表明存在架构问题:
- 停止打补丁
- 记录已发现的问题
- 先与团队讨论再推进
- 考虑是否需要重新设计
Common Debugging Scenarios
常见调试场景
Test Failures
测试失败
1. Read the FULL error message and stack trace
2. Identify which assertion failed and why
3. Check test setup - is the test environment correct?
4. Check test data - are mocks/fixtures correct?
5. Trace to the source of unexpected value1. Read the FULL error message and stack trace
2. Identify which assertion failed and why
3. Check test setup - is the test environment correct?
4. Check test data - are mocks/fixtures correct?
5. Trace to the source of unexpected valueRuntime Errors
运行时错误
1. Capture the full stack trace
2. Identify the line that throws
3. Check what values are undefined/null
4. Trace backward to find where bad value originated
5. Add validation at the source1. Capture the full stack trace
2. Identify the line that throws
3. Check what values are undefined/null
4. Trace backward to find where bad value originated
5. Add validation at the source"It worked before"
“之前还能用”
1. Use git bisect to find the breaking commit
2. Compare the change with previous working version
3. Identify what assumption changed
4. Fix at the source of the assumption violation1. Use git bisect to find the breaking commit
2. Compare the change with previous working version
3. Identify what assumption changed
4. Fix at the source of the assumption violationIntermittent Failures
间歇性失败
1. Look for race conditions
2. Check for shared mutable state
3. Examine async operation ordering
4. Look for timing dependencies
5. Add deterministic waits or proper synchronization1. Look for race conditions
2. Check for shared mutable state
3. Examine async operation ordering
4. Look for timing dependencies
5. Add deterministic waits or proper synchronizationDebugging Checklist
调试检查清单
Before claiming a bug is fixed:
- Root cause identified and documented
- Hypothesis formed and tested
- Fix addresses root cause, not symptoms
- Failing test created that reproduces bug
- Test now passes with fix
- Full test suite passes
- No "quick fix" rationalization used
- Fix is minimal and focused
在宣称bug已修复前:
- 已识别并记录根本原因
- 已提出并测试假设
- 修复针对根本原因而非症状
- 已创建可复现bug的失败测试用例
- 修复后测试已通过
- 完整测试套件已通过
- 未使用“快速修复”的合理化借口
- 修复最小化且目标明确
Success Metrics
成功指标
Systematic debugging achieves ~95% first-time fix rate vs ~40% with ad-hoc approaches.
Signs you're doing it right:
- Fixes don't create new bugs
- You can explain WHY the bug occurred
- Similar bugs don't recur
- Code is better after the fix, not just "working"
系统化调试的首次修复成功率约为95%,而临时方法仅约40%。
正确执行的标志:
- 修复未引入新bug
- 你可以解释bug出现的原因
- 类似bug不再复发
- 修复后代码质量提升,而非仅仅“可用”