why

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Five Whys Analysis

Five Whys分析

Apply Five Whys root cause analysis to investigate issues by iteratively asking "why" to drill from symptoms to root causes.
运用5Why根本原因分析方法,通过反复询问“为什么”,从问题表象深挖至根本原因。

Description

说明

Iteratively ask "why" to move from surface symptoms to fundamental causes. Identifies systemic issues rather than quick fixes.
通过反复询问“为什么”,从表面问题逐步深入到本质原因。该方法聚焦于识别系统性问题,而非仅提供快速修复方案。

Usage

使用方法

/why [issue_description]
/why [问题描述]

Variables

变量

  • ISSUE: Problem or symptom to analyze (default: prompt for input)
  • DEPTH: Number of "why" iterations (default: 5, adjust as needed)
  • ISSUE:待分析的问题或表象(默认:提示用户输入)
  • DEPTH:“为什么”的迭代次数(默认:5次,可按需调整)

Steps

步骤

  1. State the problem clearly
  2. Ask "Why did this happen?" and document the answer
  3. For that answer, ask "Why?" again
  4. Continue until reaching root cause (usually 5 iterations)
  5. Validate by working backwards: root cause → symptom
  6. Explore branches if multiple causes emerge
  7. Propose solutions addressing root causes, not symptoms
  1. 清晰陈述问题
  2. 询问“为什么会发生这种情况?”并记录答案
  3. 针对上一步的答案,再次询问“为什么?”
  4. 持续迭代直至找到根本原因(通常为5次)
  5. 通过逆向验证:从根本原因推导至问题表象
  6. 若出现多个原因,探索不同分支
  7. 提出针对根本原因的解决方案,而非仅解决表象问题

Examples

示例

Example 1: Production Bug

示例1:生产环境Bug

Problem: Users see 500 error on checkout
Why 1: Payment service throws exception
Why 2: Request timeout after 30 seconds
Why 3: Database query takes 45 seconds
Why 4: Missing index on transactions table
Why 5: Index creation wasn't in migration scripts
Root Cause: Migration review process doesn't check query performance

Solution: Add query performance checks to migration PR template
问题:用户在结账时遇到500错误
为什么1:支付服务抛出异常
为什么2:请求超时(超过30秒)
为什么3:数据库查询耗时45秒
为什么4:交易表缺少索引
为什么5:索引创建未包含在迁移脚本中
根本原因:迁移代码审查流程未检查查询性能

解决方案:在迁移PR模板中添加查询性能检查

Example 2: CI/CD Pipeline Failures

示例2:CI/CD流水线失败

Problem: E2E tests fail intermittently
Why 1: Race condition in async test setup
Why 2: Test doesn't wait for database seed completion
Why 3: Seed function doesn't return promise
Why 4: TypeScript didn't catch missing return type
Why 5: strict mode not enabled in test config
Root Cause: Inconsistent TypeScript config between src and tests

Solution: Unify TypeScript config, enable strict mode everywhere
问题:E2E测试间歇性失败
为什么1:异步测试设置存在竞态条件
为什么2:测试未等待数据库种子数据加载完成
为什么3:种子数据函数未返回Promise
为什么4:TypeScript未检测到缺失的返回类型
为什么5:测试配置中未启用严格模式
根本原因:源码与测试环境的TypeScript配置不一致

解决方案:统一TypeScript配置,在所有环境启用严格模式

Example 3: Multi-Branch Analysis

示例3:多分支分析

Problem: Feature deployment takes 2 hours

Branch A (Build):
Why 1: Docker build takes 90 minutes
Why 2: No layer caching
Why 3: Dependencies reinstalled every time
Why 4: Cache invalidated by timestamp in Dockerfile
Root Cause A: Dockerfile uses current timestamp for versioning

Branch B (Tests):
Why 1: Test suite takes 30 minutes
Why 2: Integration tests run sequentially
Why 3: Test runner config has maxWorkers: 1
Why 4: Previous developer disabled parallelism due to flaky tests
Root Cause B: Flaky tests masked by disabling parallelism

Solutions: 
A) Remove timestamp from Dockerfile, use git SHA
B) Fix flaky tests, re-enable parallel test execution
问题:功能部署耗时2小时

分支A(构建环节):
为什么1:Docker构建耗时90分钟
为什么2:未使用分层缓存
为什么3:每次构建都重新安装依赖
为什么4:Dockerfile中的时间戳导致缓存失效
根本原因A:Dockerfile使用当前时间戳进行版本标记

分支B(测试环节):
为什么1:测试套件耗时30分钟
为什么2:集成测试按顺序执行
为什么3:测试运行器配置的maxWorkers为1
为什么4:之前的开发者因测试不稳定禁用了并行执行
根本原因B:通过禁用并行执行来掩盖不稳定测试的问题

解决方案:
A) 移除Dockerfile中的时间戳,使用Git SHA进行版本标记
B) 修复不稳定测试,重新启用并行测试执行

Notes

注意事项

  • Don't stop at symptoms; keep digging for systemic issues
  • Multiple root causes may exist - explore different branches
  • Document each "why" for future reference
  • Consider both technical and process-related causes
  • The magic isn't in exactly 5 whys - stop when you reach the true root cause
  • Stop when you hit systemic/process issues, not just technical details
  • Multiple root causes are common—explore branches separately
  • If "human error" appears, keep digging: why was error possible?
  • Document every "why" for future reference
  • Root cause usually involves: missing validation, missing docs, unclear process, or missing automation
  • Test solutions: implement → verify symptom resolved → monitor for recurrence
  • 不要停留在表象问题,持续深挖系统性问题
  • 可能存在多个根本原因,需探索不同分支
  • 记录每一次“为什么”的答案,以备后续参考
  • 同时考虑技术层面和流程层面的原因
  • 核心不在于严格执行5次提问,而是找到真正的根本原因
  • 当触及系统性/流程性问题时即可停止,无需仅停留在技术细节
  • 多根本原因的情况很常见,需分别探索各分支
  • 若出现“人为错误”,需继续深挖:为什么会出现这种错误?
  • 记录每一次“为什么”的答案,以备后续参考
  • 根本原因通常涉及:缺少验证机制、缺少文档、流程不清晰或缺少自动化
  • 测试解决方案:实施方案→验证问题表象是否消除→监控问题是否复发