systematic-debugging
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSystematic Debugging
系统化调试
Core principle: Find root cause before attempting fixes. Symptom fixes are failure.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST核心原则: 尝试修复前先找到根因。仅修复症状属于无效操作。
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRSTPhase 1: Root Cause Investigation
阶段1:根因调查
BEFORE attempting ANY fix:
-
Read Error Messages Carefully
- Read stack traces completely
- Note line numbers, file paths, error codes
- Don't skip warnings
-
Reproduce Consistently
- What are the exact steps?
- If not reproducible → gather more data, don't guess
-
Check Recent Changes
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
-
Gather Evidence in Multi-Component SystemsWHEN system has multiple components (CI → build → signing, API → service → database):Add diagnostic instrumentation before proposing fixes:
For EACH component boundary: - Log what data enters/exits component - Verify environment/config propagation - Check state at each layer Run once to gather evidence → analyze → identify failing componentExample:bash# Layer 1: Workflow echo "=== Secrets available: ===" echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}" # Layer 2: Build script env | grep IDENTITY || echo "IDENTITY not in environment" # Layer 3: Signing security find-identity -v -
Trace Data FlowSeefor backward tracing technique.
references/root-cause-tracing.mdQuick version: Where does bad value originate? Trace up call chain until you find the source. Fix at source.
尝试任何修复前:
-
仔细阅读错误信息
- 完整读取栈追踪信息
- 记录行号、文件路径、错误码
- 不要忽略警告信息
-
稳定复现问题
- 准确的复现步骤是什么?
- 如果无法复现 → 收集更多数据,不要主观猜测
-
检查近期变更
- Git diff、近期提交记录
- 新增依赖、配置变更
- 环境差异
-
在多组件系统中收集证据当系统包含多个组件(CI → 构建 → 签名、API → 服务 → 数据库)时:在提出修复方案前添加诊断埋点:
For EACH component boundary: - Log what data enters/exits component - Verify environment/config propagation - Check state at each layer Run once to gather evidence → analyze → identify failing component示例:bash# Layer 1: Workflow echo "=== Secrets available: ===" echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}" # Layer 2: Build script env | grep IDENTITY || echo "IDENTITY not in environment" # Layer 3: Signing security find-identity -v -
追踪数据流参考了解反向追踪技术。
references/root-cause-tracing.md快速版本:异常值来自哪里?向上追溯调用链直到找到源头,从源头进行修复。
Phase 2: Pattern Analysis
阶段2:模式分析
- Find Working Examples - Similar working code in codebase
- Compare Against References - Read reference implementations COMPLETELY, don't skim
- Identify Differences - List every difference, don't assume "that can't matter"
- Understand Dependencies - Components, config, environment, assumptions
- 找到可运行示例 - 代码库中类似的可正常运行的代码
- 与参考实现对比 - 完整阅读参考实现,不要略读
- 识别差异 - 列出所有差异,不要假设「这不可能有影响」
- 理解依赖 - 组件、配置、环境、预设假设
Phase 3: Hypothesis and Testing
阶段3:假设与测试
- Form Single Hypothesis - "I think X is root cause because Y" - be specific
- Test Minimally - SMALLEST possible change, one variable at a time
- Verify - Worked → Phase 4. Didn't work → form NEW hypothesis, don't stack fixes
- When You Don't Know - Say so. Don't pretend.
- 提出单一假设 - 「我认为X是根因,因为Y」,表述要具体
- 最小化测试 - 尽可能小的变更,一次只修改一个变量
- 验证 - 生效 → 进入阶段4。不生效 → 提出新的假设,不要堆叠修复方案
- 不确定时直接说明 - 不要假装了解问题
Phase 4: Implementation
阶段4:实现
-
Create Failing Test Case
- Use the skill
test-driven-development - MUST have before fixing
- Use the
-
Implement Single Fix
- ONE change at a time
- No "while I'm here" improvements
-
Verify Fix
- Test passes? Other tests still pass? Issue resolved?
-
If Fix Doesn't Work
- Count attempts
- If < 3: Return to Phase 1 with new information
- If ≥ 3: Escalate (below)
-
创建失败测试用例
- 使用 skill
test-driven-development - 修复前必须完成该步骤
- 使用
-
实现单一修复
- 一次只做一个变更
- 不要做「顺手」的优化
-
验证修复效果
- 测试是否通过?其他测试是否仍然正常?问题是否完全解决?
-
如果修复不生效
- 统计尝试次数
- 如果 < 3次:带着新收集的信息回到阶段1
- 如果 ≥ 3次:升级处理(见下文)
Escalation: 3+ Failed Fixes
升级处理:3次以上修复失败
Pattern indicating architectural problem:
- Each fix reveals new problems elsewhere
- Fixes require massive refactoring
- Shared state/coupling keeps surfacing
Action: STOP. Question fundamentals:
- Is this pattern fundamentally sound?
- Are we continuing through inertia?
- Refactor architecture vs. continue fixing symptoms?
Discuss with human partner before more fix attempts. This is wrong architecture, not failed hypothesis.
表明存在架构问题的特征:
- 每次修复都会在其他位置暴露新问题
- 修复需要进行大规模重构
- 共享状态/耦合问题不断出现
应对动作:停止,质疑基础假设:
- 这个模式从根本上是否合理?
- 我们是不是因为惯性才继续当前方案?
- 应该重构架构还是继续修复症状?
在继续尝试修复前和人类伙伴讨论,这属于架构错误,不是假设失败。
Red Flags → STOP and Return to Phase 1
危险信号 → 停止并回到阶段1
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X"
- "I'll skip the test"
- "It's probably X"
- "Pattern says X but I'll adapt it differently"
- Proposing solutions before tracing data flow
- "One more fix" after 2+ failures
如果你发现自己有这些想法:
- 「先临时修复,之后再排查」
- 「就试试改下X」
- 「我就跳过测试吧」
- 「可能是X的问题」
- 「模式推荐X但我要改改用法」
- 在追踪数据流前就提出解决方案
- 2次以上失败后还想着「再修一次就行」
Human Signals You're Off Track
表明你偏离正轨的人类信号
- "Is that not happening?" → You assumed without verifying
- "Will it show us...?" → You should have added evidence gathering
- "Stop guessing" → You're proposing fixes without understanding
- "Ultrathink this" → Question fundamentals
- Frustrated "We're stuck?" → Your approach isn't working
Response: Return to Phase 1.
- 「那没有发生吗?」 → 你没有验证就做了假设
- 「它会告诉我们…吗?」 → 你本应该先收集证据
- 「别猜了」 → 你在没搞懂的情况下就提出修复方案
- 「好好想清楚这个」 → 需要质疑基础假设
- 挫败的「我们卡住了?」 → 你的方法行不通
应对措施:回到阶段1。
Supporting Techniques
支持技术
Reference files in :
references/- - Trace bugs backward through call stack
root-cause-tracing.md - - Add validation at multiple layers after finding root cause
defense-in-depth.md - - Replace arbitrary timeouts with condition polling
condition-based-waiting.md
Related skills:
- - Creating failing test case (Phase 4)
test-driven-development - - Verify fix before claiming success
verification-before-completion
参考 下的文件:
references/- - 通过调用栈反向追踪bug
root-cause-tracing.md - - 找到根因后在多个层级添加校验
defense-in-depth.md - - 用条件轮询代替任意超时
condition-based-waiting.md
相关技能:
- - 创建失败测试用例(阶段4)
test-driven-development - - 在宣布成功前验证修复效果
verification-before-completion