performing-systematic-debugging-for-stubborn-problems

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Systematic Debugging with Fagan Inspection

基于Fagan Inspection的系统性调试

This skill applies a modified Fagan Inspection methodology for systematic problem resolution when facing complex problems or stubborn bugs that have resisted multiple fix attempts.

当遇到复杂问题或多次尝试修复仍无法解决的顽固bug时，可使用此技能，它采用改进的Fagan Inspection方法来系统性解决问题。

Process Overview

流程概述

Follow these four phases sequentially. Do not skip phases or attempt fixes before completing the inspection.

请按顺序完成以下四个阶段。在完成检查前，请勿跳过阶段或尝试修复。

Phase 1: Initial Overview

第一阶段：初步概述

Establish a clear understanding of the problem before analysis:

Explain the problem in plain language without technical jargon
State expected behaviour - what should happen
State actual behaviour - what is happening instead
Document symptoms - error messages, logs, observable failures
Context - when does it occur, how often, under what conditions

Output: A clear problem statement that anyone could understand.

在分析前，先明确理解问题：

用通俗易懂的语言解释问题，避免使用技术术语
说明预期行为 - 应该发生什么
说明实际行为 - 实际发生了什么
记录症状 - 错误信息、日志、可观察到的故障
上下文信息 - 问题何时发生、发生频率、在什么条件下发生

输出： 一份任何人都能理解的清晰问题说明。

Phase 2: Systematic Inspection

第二阶段：系统性检查

Perform a line-by-line walkthrough as the "Reader" role in Fagan Inspection. Identify defects without attempting to fix them yet - this is pure inspection.

Check against these defect categories:

Logic Errors
- Incorrect conditional logic (wrong operators, inverted conditions)
- Loop conditions (infinite loops, premature termination)
- Control flow issues (unreachable code, wrong execution paths)
Boundary Conditions
- Off-by-one errors
- Edge cases (empty inputs, null values, maximum values)
- Array/collection bounds
Error Handling
- Unhandled exceptions
- Missing validations
- Silent failures (errors caught but not logged)
- Incorrect error recovery
Data Flow Issues
- Variable scope problems
- Data transformation errors
- Type mismatches or coercion issues
- State management (stale data, race conditions)
Integration Points
- API calls (incorrect endpoints, malformed requests, missing headers)
- Database interactions (query errors, transaction handling)
- External dependencies (version mismatches, configuration issues)
- Timing issues (async/await problems, race conditions)

Think aloud during this phase. For each section of code:

State what the code is intended to do
Identify any discrepancies between intent and implementation
Flag assumptions or unclear aspects
Use ultrathink to think deeper on complex sections

Output: A categorised list of identified defects with line numbers and specific descriptions.

以Fagan Inspection中的「阅读者」角色进行逐行走查。只识别缺陷，暂不尝试修复 - 这是纯粹的检查阶段。

对照以下缺陷类别进行检查：

逻辑错误
- 错误的条件逻辑（运算符错误、条件反转）
- 循环条件问题（无限循环、提前终止）
- 控制流问题（不可达代码、错误的执行路径）
边界条件
- 差一错误
- 边缘场景（空输入、空值、最大值）
- 数组/集合边界问题
错误处理
- 未处理的异常
- 缺失验证
- 静默故障（捕获了错误但未记录日志）
- 错误恢复逻辑错误
数据流问题
- 变量作用域问题
- 数据转换错误
- 类型不匹配或强制转换问题
- 状态管理问题（数据过期、竞态条件）
集成点
- API调用（端点错误、请求格式错误、缺失请求头）
- 数据库交互（查询错误、事务处理问题）
- 外部依赖（版本不匹配、配置问题）
- 时序问题（async/await问题、竞态条件）

在此阶段要边思考边表述。对于每一段代码：

说明代码的预期功能
识别预期功能与实际实现之间的差异
标记假设或不明确的部分
对复杂部分使用ultrathink进行深度思考

输出： 一份带有行号和具体描述的已识别缺陷分类列表。

Phase 3: Root Cause Analysis

第三阶段：根本原因分析

After identifying issues, trace back to find the fundamental cause - not just symptoms.

Five Whys Technique:

Ask "why" repeatedly (at least 3-5 times) to get to the underlying issue
State each "why" explicitly in your analysis
Example:
- Why did the API call fail? → Because the request was malformed
- Why was it malformed? → Because the data wasn't serialised correctly
- Why wasn't it serialised? → Because the serialiser expected a different type
- Why did it expect a different type? → Because the schema was updated but code wasn't
- Root cause: Schema versioning mismatch between services

Consider:

Environmental factors (configuration, dependencies, runtime environment)
Timing and concurrency (race conditions, async issues)
Hidden assumptions in the code or system design
Historical context (recent changes, migrations, updates)

State assumptions explicitly:

"I'm assuming X because..."
"This presumes that Y is always..."
Flag any assumptions that need verification

Output: A clear statement of the root cause, the chain of reasoning that led to it, and any assumptions that need validation.

在识别问题后，回溯查找根本原因 - 而不仅仅是症状。

五问法：

反复问「为什么」（至少3-5次）以找到根本问题
在分析中明确表述每一个「为什么」
示例：
- 为什么API调用失败？→ 因为请求格式错误
- 为什么请求格式错误？→ 因为数据未正确序列化
- 为什么数据未正确序列化？→ 因为序列化器期望不同的类型
- 为什么序列化器期望不同的类型？→ 因为服务端的Schema已更新，但代码未同步
- 根本原因：服务间的Schema版本不匹配

需要考虑的因素：

环境因素（配置、依赖、运行时环境）
时序与并发（竞态条件、异步问题）
代码或系统设计中的隐藏假设
历史背景（最近的变更、迁移、更新）

明确表述假设：

「我假设X是因为...」
「这里假设Y始终是...」
标记所有需要验证的假设

输出： 一份清晰的根本原因说明、推导过程，以及所有需要验证的假设。

Phase 4: Solution & Verification

第四阶段：解决方案与验证

Now propose specific fixes for each identified issue.

For each proposed solution:

Describe the fix - what code/configuration changes are needed
Explain why it resolves the root cause - connect it back to Phase 3 analysis
Consider side effects - what else might this change affect
Define verification steps - how to confirm the fix works

Verification Planning:

Specific test cases that would have caught this bug
Manual verification steps
Monitoring or logging to add
Edge cases to validate

Output: A structured list of fixes with verification steps.

现在为每个已识别的问题提出具体的修复方案。

对于每个提出的解决方案：

描述修复内容 - 需要修改哪些代码/配置
解释为何能解决根本原因 - 关联第三阶段的分析结果
考虑副作用 - 此变更可能会影响哪些其他部分
定义验证步骤 - 如何确认修复有效

验证计划：

本可以发现该bug的特定测试用例
手动验证步骤
需要添加的监控或日志
需要验证的边缘场景

输出： 一份带有验证步骤的结构化修复列表。

Important Guidelines

重要准则

Complete each phase thoroughly before moving to the next
Think aloud - verbalise your reasoning throughout
State assumptions explicitly rather than making implicit ones
Flag unclear aspects rather than guessing - if something is uncertain, say so
Use available tools - read files, search code, run tests, check logs
Focus on systematic analysis over quick fixes
Validate flagged aspects - after completing all phases, revisit any unclear points and use the think tool with "ultra" depth if needed to clarify them

彻底完成每个阶段后再进入下一阶段
边思考边表述 - 全程说出你的推理过程
明确表述假设，而非默认假设成立
标记不明确的部分，而非猜测 - 如果有不确定的内容，直接说明
使用可用工具 - 读取文件、搜索代码、运行测试、查看日志
专注于系统性分析，而非快速修复
验证标记的部分 - 完成所有阶段后，重新审视所有不明确的点，如有需要，使用think工具的「ultra」深度来澄清

Final Output

最终输出

After completing all four phases, provide:

Summary of findings - key defects and root cause
Proposed solutions - prioritised list with rationale
Verification plan - how to confirm fixes work
Next steps - unless the user indicates otherwise, proceed to implement the proposed solutions

完成所有四个阶段后，提供以下内容：

发现总结 - 关键缺陷和根本原因
建议解决方案 - 带有理由的优先级列表
验证计划 - 如何确认修复有效
下一步行动 - 除非用户另有指示，否则继续实施建议的解决方案

When This Skill Should NOT Be Used

不应使用此技能的场景

For simple, obvious bugs with clear fixes
When the first debugging attempt is still underway
For new features (this is for debugging existing code)
When the problem is clearly environmental (config, infrastructure) and doesn't require code inspection

针对简单、明显且有明确修复方案的bug
首次调试尝试仍在进行中时
针对新功能（此技能仅用于调试现有代码）
当问题明显是环境问题（配置、基础设施）且无需代码检查时