causal-inference-root-cause

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Causal Inference & Root Cause Analysis

Causal Inference & 根因分析

Purpose

用途

Systematically investigate causal relationships to identify true root causes rather than mere correlations or symptoms. This skill helps distinguish genuine causation from spurious associations, test competing explanations, and design interventions that address underlying drivers.

系统性地调查因果关系，以识别真正的根本原因，而非单纯的相关性或表象。本方法有助于区分真实因果关系与虚假关联，验证竞争性解释，并设计针对潜在驱动因素的干预措施。

When to Use This Skill

何时使用本方法

Investigating system failures or production incidents
Debugging performance issues with multiple potential causes
Analyzing why a metric changed (e.g., conversion rate drop)
Researching health outcomes or treatment effects
Evaluating policy or intervention impacts
Distinguishing correlation from causation in data
Identifying confounding variables in experiments
Tracing symptom back to root cause
Testing competing hypotheses about cause-effect relationships
Designing experiments to validate causal claims
Understanding why a project succeeded or failed
Analyzing customer churn or retention drivers

Trigger phrases: "root cause", "why did this happen", "causal chain", "correlation vs causation", "confounding", "spurious correlation", "what really caused", "underlying driver"

调查系统故障或生产事故
调试存在多个潜在原因的性能问题
分析指标变化的原因（例如：转化率下降）
研究健康结果或治疗效果
评估政策或干预措施的影响
区分数据中的相关性与因果性
识别实验中的混杂变量
从表象追溯至根本原因
验证关于因果关系的竞争性假设
设计实验以验证因果主张
理解项目成功或失败的原因
分析客户流失或留存的驱动因素

触发短语： "根本原因"、"这为什么会发生"、"因果链"、"相关性vs因果性"、"混杂因素"、"虚假相关性"、"真正的原因是什么"、"潜在驱动因素"

What is Causal Inference?

什么是Causal Inference？

A systematic approach to determine whether X causes Y (not just correlates with Y):

Correlation: X and Y move together (may be coincidental or due to third factor Z)
Causation: Changing X directly causes change in Y (causal mechanism exists)

Key Concepts:

Root cause: The fundamental issue that, if resolved, prevents the problem
Proximate cause: Immediate trigger (may be symptom, not root)
Confounding variable: Third factor that causes both X and Y, creating spurious correlation
Counterfactual: "What would have happened without X?" - the key causal question
Causal mechanism: The pathway or process through which X affects Y

Quick Example:

markdown

undefined

一种用于判断X是否导致Y（而非仅仅与Y相关）的系统性方法：

相关性：X与Y同步变化（可能是巧合，或由第三方因素Z导致）
因果性：改变X会直接导致Y的变化（存在因果机制）

核心概念：

根本原因：若解决该问题，即可防止事件再次发生的根本性问题
直接原因：即时触发因素（可能是表象，而非根本原因）
混杂变量：同时导致X和Y的第三方因素，会产生虚假相关性
反事实："如果没有X，会发生什么？"——这是核心的因果问题
因果机制：X影响Y的路径或过程

简单示例：

markdown

undefined

Effect: Website conversion rate dropped 30%

影响：网站转化率下降30%

Competing Hypotheses:

竞争性假设：

New checkout UI is confusing (proximate)
Payment processor latency increased (proximate)
We changed to a cheaper payment processor that's slower (root cause)

新的结账UI设计混乱（直接原因）
支付处理器延迟增加（直接原因）
我们更换了更便宜但速度更慢的支付处理器（根本原因）

Test:

验证：

Rollback UI (no change) → UI not cause
Check payment logs (confirm latency) → latency is cause
Trace to processor change → processor change is root cause

回滚UI（无变化）→ UI不是原因
检查支付日志（确认延迟）→ 延迟是原因
追溯至处理器更换 → 处理器更换是根本原因

Counterfactual:

反事实：

"If we hadn't switched processors, would conversion have dropped?" → No, conversion was fine with old processor

"如果我们没有更换处理器，转化率会下降吗？" → 不会，使用旧处理器时转化率正常

Conclusion:

结论：

Root cause = processor switch Mechanism = slow checkout → user abandonment

undefined

根本原因 = 处理器更换机制 = 结账缓慢 → 用户放弃

undefined

Workflow

工作流程

Copy this checklist and track your progress:

Root Cause Analysis Progress:
- [ ] Step 1: Define the effect
- [ ] Step 2: Generate hypotheses
- [ ] Step 3: Build causal model
- [ ] Step 4: Test causality
- [ ] Step 5: Document and validate

Step 1: Define the effect

Describe effect/outcome (what happened, be specific), quantify if possible (magnitude, frequency), establish timeline (when it started, is it ongoing?), determine baseline (what's normal, what changed?), and identify stakeholders (who's impacted, who needs answers?). Key questions: What exactly are we explaining? One-time event or recurring pattern? How do we measure objectively?

Step 2: Generate hypotheses

List proximate causes (immediate triggers/symptoms), identify potential root causes (underlying factors), consider confounders (third factors creating spurious associations), and challenge assumptions (what if initial theory wrong?). Techniques: 5 Whys (ask "why" repeatedly), Fishbone diagram (categorize causes), Timeline analysis (what changed before effect?), Differential diagnosis (what else explains symptoms?). For simple investigations → Use

resources/template.md

. For complex problems → Study

resources/methodology.md

for advanced techniques.

Step 3: Build causal model

Draw causal chains (A → B → C → Effect), identify necessary vs sufficient causes, map confounding relationships (what influences both cause and effect?), note temporal sequence (cause precedes effect - necessary for causation), and specify mechanisms (HOW X causes Y). Model elements: Direct cause (X → Y), Indirect (X → Z → Y), Confounding (Z → X and Z → Y), Mediating variable (X → M → Y), Moderating variable (X → Y depends on M).

Step 4: Test causality

Check temporal sequence (cause before effect?), assess strength of association (strong correlation?), look for dose-response (more cause → more effect?), test counterfactual (what if cause absent/removed?), search for mechanism (explain HOW), check consistency (holds across contexts?), and rule out confounders. Evidence hierarchy: RCT (gold standard) > natural experiment > longitudinal > case-control > cross-sectional > expert opinion. Use Bradford Hill Criteria (9 factors: strength, consistency, specificity, temporality, dose-response, plausibility, coherence, experiment, analogy).

Step 5: Document and validate

Create

causal-inference-root-cause.md

with: effect description/quantification, competing hypotheses, causal model (chains, confounders, mechanisms), evidence assessment, root cause(s) with confidence level, recommended tests/interventions, and limitations/alternatives. Validate using

resources/evaluators/rubric_causal_inference_root_cause.json

: verify distinguished proximate from root cause, controlled confounders, explained mechanism, assessed evidence systematically, noted uncertainty, recommended interventions, acknowledged alternatives. Minimum standard: Score ≥ 3.5.

复制此清单并跟踪进度：

根因分析进度：
- [ ] 步骤1：定义影响
- [ ] 步骤2：生成假设
- [ ] 步骤3：构建因果模型
- [ ] 步骤4：验证因果关系
- [ ] 步骤5：记录与验证

步骤1：定义影响

描述影响/结果（发生了什么，要具体），尽可能量化（幅度、频率），确定时间线（何时开始，是否持续），明确基准线（正常情况是什么，发生了什么变化），并识别相关利益相关者（谁受到影响，谁需要答案）。核心问题：我们究竟要解释什么？是一次性事件还是重复模式？如何客观衡量？

步骤2：生成假设

列出直接原因（即时触发因素/表象），识别潜在根本原因（潜在因素），考虑混杂因素（产生虚假关联的第三方因素），并挑战假设（如果初始理论错误怎么办？）。技巧：5Why法（反复问“为什么”）、鱼骨图（分类整理原因）、时间线分析（影响发生前有什么变化？）、鉴别诊断（还有什么能解释这些表象？）。对于简单调查 → 使用

resources/template.md

。对于复杂问题 → 学习

resources/methodology.md

中的高级技巧。

步骤3：构建因果模型

绘制因果链（A → B → C → 影响），识别必要原因与充分原因，绘制混杂关系图（哪些因素同时影响原因和结果？），注意时间顺序（原因先于结果——这是因果关系的必要条件），并明确机制（X如何导致Y）。模型元素：直接因果（X → Y）、间接因果（X → Z → Y）、混杂关系（Z → X 且 Z → Y）、中介变量（X → M → Y）、调节变量（X → Y 的关系取决于M）。

步骤4：验证因果关系

检查时间顺序（原因是否在结果之前？），评估关联强度（相关性是否显著？），寻找剂量反应关系（原因越多 → 影响越大？），验证反事实（如果原因不存在/被移除会怎样？），查找机制解释（如何发生？），检查一致性（在不同场景下是否成立？），并排除混杂因素。证据层级：RCT（金标准）> 自然实验 > 纵向研究 > 病例对照研究 > 横断面研究 > 专家意见。使用Bradford Hill准则（9个因素：强度、一致性、特异性、时间性、剂量反应、合理性、连贯性、实验、类比）。

步骤5：记录与验证

创建

causal-inference-root-cause.md

文件，包含：影响描述/量化、竞争性假设、因果模型（链、混杂因素、机制）、证据评估、带有置信度的根本原因、推荐的测试/干预措施、局限性/替代方案。使用

resources/evaluators/rubric_causal_inference_root_cause.json

进行验证：确认区分了直接原因与根本原因、控制了混杂因素、解释了机制、系统评估了证据、注明了不确定性、推荐了干预措施、认可了替代方案。最低标准：得分≥3.5。

Common Patterns

常见模式

For incident investigation (engineering):

Effect: System outage, performance degradation
Hypotheses: Recent deploy, traffic spike, dependency failure, resource exhaustion
Model: Timeline + dependency graph + recent changes
Test: Logs, metrics, rollback experiments
Output: Postmortem with root cause and prevention plan

For metric changes (product/business):

Effect: Conversion drop, revenue change, user engagement shift
Hypotheses: Product changes, seasonality, market shifts, measurement issues
Model: User journey + external factors + recent experiments
Test: Cohort analysis, A/B test data, segmentation
Output: Causal explanation with recommended actions

For policy evaluation (research/public policy):

Effect: Health outcome, economic indicator, social metric
Hypotheses: Policy intervention, confounding factors, secular trends
Model: DAG with confounders + mechanisms
Test: Difference-in-differences, regression discontinuity, propensity matching
Output: Causal effect estimate with confidence intervals

For debugging (software):

Effect: Bug, unexpected behavior, test failure
Hypotheses: Recent changes, edge cases, race conditions, dependency issues
Model: Code paths + data flows + timing
Test: Reproduce, isolate, binary search, git bisect
Output: Bug report with root cause and fix

针对（工程领域的）事故调查：

影响：系统停机、性能下降
假设：最近的部署、流量峰值、依赖故障、资源耗尽
模型：时间线 + 依赖关系图 + 近期变更
验证：日志、指标、回滚实验
输出：包含根本原因和预防计划的事后分析报告

针对（产品/业务的）指标变化：

影响：转化率下降、收入变化、用户参与度波动
假设：产品变更、季节性因素、市场变化、测量问题
模型：用户旅程 + 外部因素 + 近期实验
验证： cohort分析、A/B测试数据、细分分析
输出：包含推荐行动的因果解释

针对（研究/公共政策的）政策评估：

影响：健康结果、经济指标、社会指标
假设：政策干预、混杂因素、长期趋势
模型：带有混杂因素的DAG + 机制
验证：双重差分法、断点回归法、倾向得分匹配法
输出：带有置信区间的因果效应估计

针对（软件的）调试：

影响：Bug、意外行为、测试失败
假设：近期变更、边缘情况、竞态条件、依赖问题
模型：代码路径 + 数据流 + 时序
验证：复现问题、隔离变量、二分查找、git bisect
输出：包含根本原因和修复方案的Bug报告

Guardrails

注意事项

Do:

Distinguish correlation from causation explicitly
Generate multiple competing hypotheses (not just confirm first theory)
Map out confounding variables and control for them
Specify causal mechanisms (HOW X causes Y)
Test counterfactuals ("what if X hadn't happened?")
State confidence levels and uncertainty
Acknowledge alternative explanations
Recommend testable interventions based on root cause

Don't:

Confuse proximate cause with root cause
Cherry-pick evidence that confirms initial hypothesis
Assume correlation implies causation
Ignore confounding variables
Skip mechanism explanation (just stating correlation)
Overstate confidence without strong evidence
Stop at first plausible explanation without testing alternatives
Propose interventions without identifying root cause

Common Pitfalls:

Post hoc ergo propter hoc: "After this, therefore because of this" (temporal sequence ≠ causation)
Spurious correlation: Two things correlate due to third factor or coincidence
Confounding: Third variable causes both X and Y
Reverse causation: Y causes X, not X causes Y
Selection bias: Sample is not representative
Regression to mean: Extreme values naturally move toward average

应做：

明确区分相关性与因果性
生成多个竞争性假设（而非仅验证首个理论）
梳理混杂变量并加以控制
明确因果机制（X如何导致Y）
验证反事实（“如果X没有发生会怎样？”）
说明置信度和不确定性
认可替代解释
根据根本原因推荐可测试的干预措施

不应做：

将直接原因与根本原因混淆
挑选支持初始假设的证据
假设相关性意味着因果性
忽略混杂变量
跳过机制解释（仅陈述相关性）
在没有有力证据的情况下过度自信
在未验证替代方案时就止步于首个看似合理的解释
在未识别根本原因的情况下提出干预措施

常见陷阱：

事后归因谬误：“在此之后，因此因为此”（时间顺序≠因果性）
虚假相关性：两个事物因第三方因素或巧合而相关
混杂因素：第三方变量同时导致X和Y
反向因果：Y导致X，而非X导致Y
选择偏差：样本不具代表性
均值回归：极端值自然向平均值回归

Quick Reference

快速参考

Template:
```
resources/template.md
```
- Structured framework for root cause analysis
Methodology:
```
resources/methodology.md
```
- Advanced techniques (DAGs, confounding control, Bradford Hill criteria)

Quality rubric:

resources/evaluators/rubric_causal_inference_root_cause.json

Output file:
```
causal-inference-root-cause.md
```
Key distinction: Correlation (X and Y move together) vs. Causation (X → Y mechanism)
Gold standard test: Randomized controlled trial (eliminates confounding)
Essential criteria: Temporal sequence (cause before effect), mechanism (how it works), counterfactual (what if cause absent)

模板：
```
resources/template.md
```
- 根因分析的结构化框架
方法论：
```
resources/methodology.md
```
- 高级技巧（DAG、混杂变量控制、Bradford Hill准则）

质量评估标准：

resources/evaluators/rubric_causal_inference_root_cause.json

输出文件：
```
causal-inference-root-cause.md
```
核心区别：相关性（X与Y同步变化）vs 因果性（X→Y机制）
验证金标准：随机对照试验（RCT，消除混杂因素）
关键准则：时间顺序（原因先于结果）、机制（如何运作）、反事实（如果原因不存在会怎样）