causal-inference-root-cause
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCausal Inference & Root Cause Analysis
Causal Inference & 根因分析
Table of Contents
目录
Purpose
用途
Systematically investigate causal relationships to identify true root causes rather than mere correlations or symptoms. This skill helps distinguish genuine causation from spurious associations, test competing explanations, and design interventions that address underlying drivers.
系统性地调查因果关系,以识别真正的根本原因,而非单纯的相关性或表象。本方法有助于区分真实因果关系与虚假关联,验证竞争性解释,并设计针对潜在驱动因素的干预措施。
When to Use This Skill
何时使用本方法
- Investigating system failures or production incidents
- Debugging performance issues with multiple potential causes
- Analyzing why a metric changed (e.g., conversion rate drop)
- Researching health outcomes or treatment effects
- Evaluating policy or intervention impacts
- Distinguishing correlation from causation in data
- Identifying confounding variables in experiments
- Tracing symptom back to root cause
- Testing competing hypotheses about cause-effect relationships
- Designing experiments to validate causal claims
- Understanding why a project succeeded or failed
- Analyzing customer churn or retention drivers
Trigger phrases: "root cause", "why did this happen", "causal chain", "correlation vs causation", "confounding", "spurious correlation", "what really caused", "underlying driver"
- 调查系统故障或生产事故
- 调试存在多个潜在原因的性能问题
- 分析指标变化的原因(例如:转化率下降)
- 研究健康结果或治疗效果
- 评估政策或干预措施的影响
- 区分数据中的相关性与因果性
- 识别实验中的混杂变量
- 从表象追溯至根本原因
- 验证关于因果关系的竞争性假设
- 设计实验以验证因果主张
- 理解项目成功或失败的原因
- 分析客户流失或留存的驱动因素
触发短语: "根本原因"、"这为什么会发生"、"因果链"、"相关性vs因果性"、"混杂因素"、"虚假相关性"、"真正的原因是什么"、"潜在驱动因素"
What is Causal Inference?
什么是Causal Inference?
A systematic approach to determine whether X causes Y (not just correlates with Y):
- Correlation: X and Y move together (may be coincidental or due to third factor Z)
- Causation: Changing X directly causes change in Y (causal mechanism exists)
Key Concepts:
- Root cause: The fundamental issue that, if resolved, prevents the problem
- Proximate cause: Immediate trigger (may be symptom, not root)
- Confounding variable: Third factor that causes both X and Y, creating spurious correlation
- Counterfactual: "What would have happened without X?" - the key causal question
- Causal mechanism: The pathway or process through which X affects Y
Quick Example:
markdown
undefined一种用于判断X是否导致Y(而非仅仅与Y相关)的系统性方法:
- 相关性:X与Y同步变化(可能是巧合,或由第三方因素Z导致)
- 因果性:改变X会直接导致Y的变化(存在因果机制)
核心概念:
- 根本原因:若解决该问题,即可防止事件再次发生的根本性问题
- 直接原因:即时触发因素(可能是表象,而非根本原因)
- 混杂变量:同时导致X和Y的第三方因素,会产生虚假相关性
- 反事实:"如果没有X,会发生什么?"——这是核心的因果问题
- 因果机制:X影响Y的路径或过程
简单示例:
markdown
undefinedEffect: Website conversion rate dropped 30%
影响:网站转化率下降30%
Competing Hypotheses:
竞争性假设:
- New checkout UI is confusing (proximate)
- Payment processor latency increased (proximate)
- We changed to a cheaper payment processor that's slower (root cause)
- 新的结账UI设计混乱(直接原因)
- 支付处理器延迟增加(直接原因)
- 我们更换了更便宜但速度更慢的支付处理器(根本原因)
Test:
验证:
- Rollback UI (no change) → UI not cause
- Check payment logs (confirm latency) → latency is cause
- Trace to processor change → processor change is root cause
- 回滚UI(无变化)→ UI不是原因
- 检查支付日志(确认延迟)→ 延迟是原因
- 追溯至处理器更换 → 处理器更换是根本原因
Counterfactual:
反事实:
"If we hadn't switched processors, would conversion have dropped?"
→ No, conversion was fine with old processor
"如果我们没有更换处理器,转化率会下降吗?"
→ 不会,使用旧处理器时转化率正常
Conclusion:
结论:
Root cause = processor switch
Mechanism = slow checkout → user abandonment
undefined根本原因 = 处理器更换
机制 = 结账缓慢 → 用户放弃
undefinedWorkflow
工作流程
Copy this checklist and track your progress:
Root Cause Analysis Progress:
- [ ] Step 1: Define the effect
- [ ] Step 2: Generate hypotheses
- [ ] Step 3: Build causal model
- [ ] Step 4: Test causality
- [ ] Step 5: Document and validateStep 1: Define the effect
Describe effect/outcome (what happened, be specific), quantify if possible (magnitude, frequency), establish timeline (when it started, is it ongoing?), determine baseline (what's normal, what changed?), and identify stakeholders (who's impacted, who needs answers?). Key questions: What exactly are we explaining? One-time event or recurring pattern? How do we measure objectively?
Step 2: Generate hypotheses
List proximate causes (immediate triggers/symptoms), identify potential root causes (underlying factors), consider confounders (third factors creating spurious associations), and challenge assumptions (what if initial theory wrong?). Techniques: 5 Whys (ask "why" repeatedly), Fishbone diagram (categorize causes), Timeline analysis (what changed before effect?), Differential diagnosis (what else explains symptoms?). For simple investigations → Use . For complex problems → Study for advanced techniques.
resources/template.mdresources/methodology.mdStep 3: Build causal model
Draw causal chains (A → B → C → Effect), identify necessary vs sufficient causes, map confounding relationships (what influences both cause and effect?), note temporal sequence (cause precedes effect - necessary for causation), and specify mechanisms (HOW X causes Y). Model elements: Direct cause (X → Y), Indirect (X → Z → Y), Confounding (Z → X and Z → Y), Mediating variable (X → M → Y), Moderating variable (X → Y depends on M).
Step 4: Test causality
Check temporal sequence (cause before effect?), assess strength of association (strong correlation?), look for dose-response (more cause → more effect?), test counterfactual (what if cause absent/removed?), search for mechanism (explain HOW), check consistency (holds across contexts?), and rule out confounders. Evidence hierarchy: RCT (gold standard) > natural experiment > longitudinal > case-control > cross-sectional > expert opinion. Use Bradford Hill Criteria (9 factors: strength, consistency, specificity, temporality, dose-response, plausibility, coherence, experiment, analogy).
Step 5: Document and validate
Create with: effect description/quantification, competing hypotheses, causal model (chains, confounders, mechanisms), evidence assessment, root cause(s) with confidence level, recommended tests/interventions, and limitations/alternatives. Validate using : verify distinguished proximate from root cause, controlled confounders, explained mechanism, assessed evidence systematically, noted uncertainty, recommended interventions, acknowledged alternatives. Minimum standard: Score ≥ 3.5.
causal-inference-root-cause.mdresources/evaluators/rubric_causal_inference_root_cause.json复制此清单并跟踪进度:
根因分析进度:
- [ ] 步骤1:定义影响
- [ ] 步骤2:生成假设
- [ ] 步骤3:构建因果模型
- [ ] 步骤4:验证因果关系
- [ ] 步骤5:记录与验证步骤1:定义影响
描述影响/结果(发生了什么,要具体),尽可能量化(幅度、频率),确定时间线(何时开始,是否持续),明确基准线(正常情况是什么,发生了什么变化),并识别相关利益相关者(谁受到影响,谁需要答案)。核心问题:我们究竟要解释什么?是一次性事件还是重复模式?如何客观衡量?
步骤2:生成假设
列出直接原因(即时触发因素/表象),识别潜在根本原因(潜在因素),考虑混杂因素(产生虚假关联的第三方因素),并挑战假设(如果初始理论错误怎么办?)。技巧:5Why法(反复问“为什么”)、鱼骨图(分类整理原因)、时间线分析(影响发生前有什么变化?)、鉴别诊断(还有什么能解释这些表象?)。对于简单调查 → 使用。对于复杂问题 → 学习中的高级技巧。
resources/template.mdresources/methodology.md步骤3:构建因果模型
绘制因果链(A → B → C → 影响),识别必要原因与充分原因,绘制混杂关系图(哪些因素同时影响原因和结果?),注意时间顺序(原因先于结果——这是因果关系的必要条件),并明确机制(X如何导致Y)。模型元素:直接因果(X → Y)、间接因果(X → Z → Y)、混杂关系(Z → X 且 Z → Y)、中介变量(X → M → Y)、调节变量(X → Y 的关系取决于M)。
步骤4:验证因果关系
检查时间顺序(原因是否在结果之前?),评估关联强度(相关性是否显著?),寻找剂量反应关系(原因越多 → 影响越大?),验证反事实(如果原因不存在/被移除会怎样?),查找机制解释(如何发生?),检查一致性(在不同场景下是否成立?),并排除混杂因素。证据层级:RCT(金标准)> 自然实验 > 纵向研究 > 病例对照研究 > 横断面研究 > 专家意见。使用Bradford Hill准则(9个因素:强度、一致性、特异性、时间性、剂量反应、合理性、连贯性、实验、类比)。
步骤5:记录与验证
创建文件,包含:影响描述/量化、竞争性假设、因果模型(链、混杂因素、机制)、证据评估、带有置信度的根本原因、推荐的测试/干预措施、局限性/替代方案。使用进行验证:确认区分了直接原因与根本原因、控制了混杂因素、解释了机制、系统评估了证据、注明了不确定性、推荐了干预措施、认可了替代方案。最低标准:得分≥3.5。
causal-inference-root-cause.mdresources/evaluators/rubric_causal_inference_root_cause.jsonCommon Patterns
常见模式
For incident investigation (engineering):
- Effect: System outage, performance degradation
- Hypotheses: Recent deploy, traffic spike, dependency failure, resource exhaustion
- Model: Timeline + dependency graph + recent changes
- Test: Logs, metrics, rollback experiments
- Output: Postmortem with root cause and prevention plan
For metric changes (product/business):
- Effect: Conversion drop, revenue change, user engagement shift
- Hypotheses: Product changes, seasonality, market shifts, measurement issues
- Model: User journey + external factors + recent experiments
- Test: Cohort analysis, A/B test data, segmentation
- Output: Causal explanation with recommended actions
For policy evaluation (research/public policy):
- Effect: Health outcome, economic indicator, social metric
- Hypotheses: Policy intervention, confounding factors, secular trends
- Model: DAG with confounders + mechanisms
- Test: Difference-in-differences, regression discontinuity, propensity matching
- Output: Causal effect estimate with confidence intervals
For debugging (software):
- Effect: Bug, unexpected behavior, test failure
- Hypotheses: Recent changes, edge cases, race conditions, dependency issues
- Model: Code paths + data flows + timing
- Test: Reproduce, isolate, binary search, git bisect
- Output: Bug report with root cause and fix
针对(工程领域的)事故调查:
- 影响:系统停机、性能下降
- 假设:最近的部署、流量峰值、依赖故障、资源耗尽
- 模型:时间线 + 依赖关系图 + 近期变更
- 验证:日志、指标、回滚实验
- 输出:包含根本原因和预防计划的事后分析报告
针对(产品/业务的)指标变化:
- 影响:转化率下降、收入变化、用户参与度波动
- 假设:产品变更、季节性因素、市场变化、测量问题
- 模型:用户旅程 + 外部因素 + 近期实验
- 验证: cohort分析、A/B测试数据、细分分析
- 输出:包含推荐行动的因果解释
针对(研究/公共政策的)政策评估:
- 影响:健康结果、经济指标、社会指标
- 假设:政策干预、混杂因素、长期趋势
- 模型:带有混杂因素的DAG + 机制
- 验证:双重差分法、断点回归法、倾向得分匹配法
- 输出:带有置信区间的因果效应估计
针对(软件的)调试:
- 影响:Bug、意外行为、测试失败
- 假设:近期变更、边缘情况、竞态条件、依赖问题
- 模型:代码路径 + 数据流 + 时序
- 验证:复现问题、隔离变量、二分查找、git bisect
- 输出:包含根本原因和修复方案的Bug报告
Guardrails
注意事项
Do:
- Distinguish correlation from causation explicitly
- Generate multiple competing hypotheses (not just confirm first theory)
- Map out confounding variables and control for them
- Specify causal mechanisms (HOW X causes Y)
- Test counterfactuals ("what if X hadn't happened?")
- State confidence levels and uncertainty
- Acknowledge alternative explanations
- Recommend testable interventions based on root cause
Don't:
- Confuse proximate cause with root cause
- Cherry-pick evidence that confirms initial hypothesis
- Assume correlation implies causation
- Ignore confounding variables
- Skip mechanism explanation (just stating correlation)
- Overstate confidence without strong evidence
- Stop at first plausible explanation without testing alternatives
- Propose interventions without identifying root cause
Common Pitfalls:
- Post hoc ergo propter hoc: "After this, therefore because of this" (temporal sequence ≠ causation)
- Spurious correlation: Two things correlate due to third factor or coincidence
- Confounding: Third variable causes both X and Y
- Reverse causation: Y causes X, not X causes Y
- Selection bias: Sample is not representative
- Regression to mean: Extreme values naturally move toward average
应做:
- 明确区分相关性与因果性
- 生成多个竞争性假设(而非仅验证首个理论)
- 梳理混杂变量并加以控制
- 明确因果机制(X如何导致Y)
- 验证反事实(“如果X没有发生会怎样?”)
- 说明置信度和不确定性
- 认可替代解释
- 根据根本原因推荐可测试的干预措施
不应做:
- 将直接原因与根本原因混淆
- 挑选支持初始假设的证据
- 假设相关性意味着因果性
- 忽略混杂变量
- 跳过机制解释(仅陈述相关性)
- 在没有有力证据的情况下过度自信
- 在未验证替代方案时就止步于首个看似合理的解释
- 在未识别根本原因的情况下提出干预措施
常见陷阱:
- 事后归因谬误:“在此之后,因此因为此”(时间顺序≠因果性)
- 虚假相关性:两个事物因第三方因素或巧合而相关
- 混杂因素:第三方变量同时导致X和Y
- 反向因果:Y导致X,而非X导致Y
- 选择偏差:样本不具代表性
- 均值回归:极端值自然向平均值回归
Quick Reference
快速参考
- Template: - Structured framework for root cause analysis
resources/template.md - Methodology: - Advanced techniques (DAGs, confounding control, Bradford Hill criteria)
resources/methodology.md - Quality rubric:
resources/evaluators/rubric_causal_inference_root_cause.json - Output file:
causal-inference-root-cause.md - Key distinction: Correlation (X and Y move together) vs. Causation (X → Y mechanism)
- Gold standard test: Randomized controlled trial (eliminates confounding)
- Essential criteria: Temporal sequence (cause before effect), mechanism (how it works), counterfactual (what if cause absent)
- 模板:- 根因分析的结构化框架
resources/template.md - 方法论:- 高级技巧(DAG、混杂变量控制、Bradford Hill准则)
resources/methodology.md - 质量评估标准:
resources/evaluators/rubric_causal_inference_root_cause.json - 输出文件:
causal-inference-root-cause.md - 核心区别:相关性(X与Y同步变化)vs 因果性(X→Y机制)
- 验证金标准:随机对照试验(RCT,消除混杂因素)
- 关键准则:时间顺序(原因先于结果)、机制(如何运作)、反事实(如果原因不存在会怎样)