root-cause-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRoot Cause Analysis Orchestration Skill
根本原因分析编排技能
This skill helps you systematically identify the root cause of any problem using proven methodologies from the Toyota Production System and other industry-standard techniques.
本技能借助丰田生产体系及其他行业标准的成熟方法,帮助你系统性地识别任何问题的根本原因。
Quick Reference: When to Load Which Resource
快速参考:何时加载对应资源
| Your Problem Type | Load Resource | Why |
|---|---|---|
| Need to understand 5 Whys, Fishbone, Pareto, Fault Tree methodology | | Learn each method step-by-step with examples |
| Looking for common root causes in your domain | | Pattern match against known causes: software, hardware, process, personal |
| Want to see complete worked examples | | Study real cases: software bugs, vehicle maintenance, system failures, personal problems |
| Advanced: need barrier analysis, complex cause mapping | | Formal methods: Fault Tree, Barrier Analysis, multi-methodology chains |
| 问题类型 | 加载资源 | 原因 |
|---|---|---|
| 需要了解5 Whys、Fishbone、Pareto、Fault Tree方法 | | 通过示例逐步学习每种方法 |
| 查找所在领域的常见根本原因 | | 与已知原因进行模式匹配:软件、硬件、流程、个人 |
| 查看完整的实际案例 | | 研究真实案例:软件bug、车辆维护、系统故障、个人问题 |
| 进阶需求:需要障碍分析、复杂原因映射 | | 正式方法:Fault Tree、Barrier Analysis、多方法组合 |
Core Principle
核心原则
Do not treat symptoms—find and fix the root cause. As Taiichi Ohno, architect of the Toyota Production System, said: "By repeating why five times, the nature of the problem as well as its solution becomes clear."
不要只处理症状——找到并解决根本原因。 正如丰田生产体系的缔造者大野耐一所说:“通过连续问五次为什么,问题的本质和解决方案就会变得清晰。”
Orchestration Protocol
编排流程
Phase 1: Problem Classification
阶段1:问题分类
Quickly identify your problem domain and complexity:
Problem Domain:
- Software: Code bugs, system failures, performance, deployment
- Hardware: Equipment, mechanical, electrical, maintenance
- Process: Workflow, procedures, organizational, communication
- Personal: Life challenges, productivity, habits, wellbeing
Complexity Level:
- Simple: Clear failure chain, 1-2 likely causes → Use 5 Whys
- Complex: Multiple possible causes, unknown scope → Start with Fishbone
- Critical/Safety: High stakes, needs rigor → Use Fault Tree
- Multiple Issues: Many competing problems → Use Pareto first
Action: Load appropriate resource file(s) based on classification.
快速识别你的问题领域和复杂度:
问题领域:
- 软件:代码bug、系统故障、性能问题、部署问题
- 硬件:设备、机械、电气、维护
- 流程:工作流、程序、组织、沟通
- 个人:生活挑战、生产力、习惯、健康
复杂度等级:
- 简单:故障链清晰,1-2个可能原因 → 使用5 Whys
- 复杂:多个可能原因,范围未知 → 从Fishbone开始
- 关键/安全相关:风险高,需要严谨性 → 使用Fault Tree
- 多问题并存:多个相互竞争的问题 → 先使用Pareto
行动: 根据分类加载对应的资源文件。
Phase 2: Methodology Selection
阶段2:方法选择
Based on problem type, select your approach:
| Situation | Recommended | Load |
|---|---|---|
| Single clear failure | 5 Whys | methodologies.md |
| Complex/multiple possibilities | Fishbone → 5 Whys | methodologies.md |
| Competing priorities | Pareto → 5 Whys | methodologies.md |
| Safety/high-stakes | Fault Tree | advanced-techniques.md |
| Process breakdown | Barrier Analysis | advanced-techniques.md |
| Pattern matching | Common causes + 5 Whys | common-root-causes.md |
根据问题类型选择合适的方法:
| 场景 | 推荐方法 | 加载资源 |
|---|---|---|
| 单一明确故障 | 5 Whys | methodologies.md |
| 复杂/多可能性 | Fishbone → 5 Whys | methodologies.md |
| 优先级竞争 | Pareto → 5 Whys | methodologies.md |
| 安全/高风险 | Fault Tree | advanced-techniques.md |
| 流程崩溃 | Barrier Analysis | advanced-techniques.md |
| 模式匹配 | 常见原因 + 5 Whys | common-root-causes.md |
Phase 3: Execution & Verification
阶段3:执行与验证
During Analysis:
- Define problem clearly (What/Where/When/Impact)
- Gather evidence systematically
- Apply selected methodology
- Document reasoning at each step
- Verify root cause with Forward/Backward tests
Before Finalizing:
- Validate conclusion against evidence
- Check for red flags (see common-root-causes.md)
- Confirm actionability (can you fix this?)
- Develop solutions addressing root cause
分析过程中:
- 清晰定义问题(是什么/在哪里/何时发生/影响)
- 系统性收集证据
- 应用选定的方法
- 记录每一步的推理过程
- 使用正向/反向测试验证根本原因
最终确定前:
- 根据证据验证结论
- 检查警示信号(详见common-root-causes.md)
- 确认可操作性(你能否解决该问题?)
- 制定针对根本原因的解决方案
Problem Definition Framework
问题定义框架
Create a clear problem statement before analysis:
Essential Elements:
- What: Observable symptom (not assumed cause)
- Where: Location/system/component affected
- When: Timeline, frequency, pattern
- Impact: Users/systems affected, severity
Example:
"Users in EU region experience 3-5 second dashboard load delays during 9-11 AM UTC peak hours, affecting ~2,000 daily active users. Started after v2.4 deployment on Nov 18th."
在分析前创建清晰的问题陈述:
核心要素:
- 是什么: 可观察到的症状(而非假设的原因)
- 在哪里: 受影响的位置/系统/组件
- 何时: 时间线、频率、模式
- 影响: 受影响的用户/系统、严重程度
示例:
"欧盟地区的用户在UTC时间9-11点高峰时段会遇到3-5秒的仪表盘加载延迟,影响约2000名日活跃用户。该问题始于11月18日v2.4版本部署之后。"
Evidence Gathering (Go and See)
证据收集(现地现物)
Follow Toyota's principle—collect facts, not opinions:
Key Evidence Sources:
- Logs, metrics, monitoring data
- Timeline of events and changes
- System/code/configuration changes before problem
- Environmental factors (load, traffic, season)
- User reports and reproduction steps
- System state before/during/after
遵循丰田的原则——收集事实,而非观点:
关键证据来源:
- 日志、指标、监控数据
- 事件与变更的时间线
- 问题发生前的系统/代码/配置变更
- 环境因素(负载、流量、季节)
- 用户报告与复现步骤
- 问题发生前/中/后的系统状态
RCA Methodologies
RCA方法
See for complete methodology guide.
resources/rca-methodologies.md完整的方法指南请参阅。
resources/rca-methodologies.mdResource Files Summary
资源文件摘要
resources/rca-methodologies.md
resources/rca-methodologies.mdresources/rca-methodologies.md
resources/rca-methodologies.mdComprehensive methodology guide covering:
- 5 Whys: Step-by-step process with software examples
- Fishbone Diagram: Structure, 6 M's categories, process
- Pareto Analysis: Prioritization using 80/20 rule
- Fault Tree Analysis: Top-down formal analysis
- Barrier Analysis: Control failure examination
- Structured 6-phase RCA process, domain-specific guidance, templates
全面的方法指南,涵盖:
- 5 Whys:含软件示例的分步流程
- Fishbone Diagram:结构、6M分类、流程
- Pareto Analysis:基于80/20法则的优先级排序
- Fault Tree Analysis:自上而下的正式分析
- Barrier Analysis:控制失效检查
- 结构化的6阶段RCA流程、领域特定指导、模板
resources/common-root-causes.md
resources/common-root-causes.mdresources/common-root-causes.md
resources/common-root-causes.mdPattern reference catalog by domain:
- Software Engineering: Code defects, configuration, dependencies, deployment
- Hardware & Equipment: Mechanical, electrical, operational, maintenance
- Process & Operations: Workflow, design, resources
- Personal/Life: Health, habits, environment, skills
- Red flags, recurring themes, pattern recognition
按领域划分的模式参考目录:
- 软件工程:代码缺陷、配置、依赖、部署
- 硬件与设备:机械、电气、操作、维护
- 流程与运营:工作流、设计、资源
- 个人/生活:健康、习惯、环境、技能
- 警示信号、重复主题、模式识别
resources/example-analyses.md
resources/example-analyses.mdresources/example-analyses.md
resources/example-analyses.mdFour worked examples with full analysis:
- Software Bug: JWT authentication (5 Whys)
- Vehicle Maintenance: Overheating (5 Whys)
- System Failure: E-commerce checkout (Fishbone + 5 Whys)
- Personal Productivity: Missed deadlines (Fishbone + 5 Whys)
四个完整分析的实际案例:
- 软件Bug:JWT认证(5 Whys)
- 车辆维护:过热问题(5 Whys)
- 系统故障:电商结账故障(Fishbone + 5 Whys)
- 个人生产力:错过截止日期(Fishbone + 5 Whys)
resources/advanced-techniques.md
resources/advanced-techniques.mdresources/advanced-techniques.md
resources/advanced-techniques.mdFormal methods for complex problems:
- Fault Tree Analysis: Boolean logic, safety systems
- Barrier Analysis: Control failures
- Multi-Methodology Chains: Complex orchestration
- Verification Frameworks: Comprehensive testing
针对复杂问题的正式方法:
- Fault Tree Analysis:布尔逻辑、安全系统
- Barrier Analysis:控制失效分析
- Multi-Methodology Chains:复杂编排
- Verification Frameworks:全面测试
How This Skill Works
本技能的工作方式
- Clarify your situation: Domain, observations, evidence, time
- Recommend approach: Complexity analysis, methodology, resources
- Guide through analysis: Problem statement, evidence, methodology, verification
- Deliver output: Analysis, root cause, solutions, implementation
- 明确你的场景:领域、观察结果、证据、时间
- 推荐方法:复杂度分析、方法、资源
- 引导分析过程:问题陈述、证据、方法、验证
- 交付成果:分析结果、根本原因、解决方案、实施建议
Quick Start: 5-Minute RCA
快速上手:5分钟RCA
- State problem (What/Where/When/Impact)
- First Why: fact-based answer
- Second Why: dig deeper
- Third Why: dig deeper again
- Verify: would fixing this prevent it?
- 陈述问题(是什么/在哪里/何时发生/影响)
- 第一个Why:基于事实的回答
- 第二个Why:深入挖掘
- 第三个Why:再次深入挖掘
- 验证:解决该原因能否避免问题再次发生?
Templates & Examples
模板与示例
- 5 Whys Template in
resources/rca-methodologies.md - Fishbone Template in
resources/rca-methodologies.md - Worked Examples in
resources/example-analyses.md - Solution Structures in
resources/example-analyses.md
- 5 Whys模板 位于
resources/rca-methodologies.md - Fishbone模板 位于
resources/rca-methodologies.md - 实际案例 位于
resources/example-analyses.md - 解决方案结构 位于
resources/example-analyses.md
Next Steps
后续步骤
- Identify problem domain (software/hardware/process/personal)
- Load appropriate resource from table above
- Select methodology based on complexity
- Follow step-by-step process in resource
- Verify root cause (Forward/Backward tests)
- Develop actionable solutions
Remember: Goal is systematic investigation—disciplined questioning until you reach a cause you can actually fix.
- 确定问题领域(软件/硬件/流程/个人)
- 从上述表格中加载对应资源
- 根据复杂度选择方法
- 遵循资源中的分步流程
- 验证根本原因(正向/反向测试)
- 制定可执行的解决方案
记住:目标是系统性调查——通过严谨的提问,直到找到你实际可以解决的根本原因。