thinking-scientific-method

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Scientific Method

科学方法

Overview

概述

The scientific method is a systematic approach to understanding through observation, hypothesis formation, prediction, testing, and revision. In engineering, it provides rigor to debugging, experimentation, and investigation. The key insight: good hypotheses must be falsifiable—you must be able to prove them wrong.
Core Principle: Form hypotheses that could be proven false. Design experiments that could falsify them. Update beliefs based on evidence.
科学方法是通过观察、假设构建、预测、测试和修正来实现认知的系统性方法。在工程领域,它为调试、实验和问题排查工作提供了严谨性支撑。核心认知是:优秀的假设必须具备可证伪性——你必须有能力证明它是错误的。
核心原则: 构建可被证伪的假设,设计能够验证其不成立的实验,根据证据更新认知判断。

When to Use

适用场景

  • Debugging (systematic cause identification)
  • Performance investigation
  • A/B test design
  • Feature experimentation
  • Root cause analysis
  • Data analysis
  • Any investigation where you're testing theories
Decision flow:
Investigating something?
  → Do you have a clear hypothesis? → no → FORM A HYPOTHESIS
  → Can your hypothesis be proven false? → no → MAKE IT FALSIFIABLE
  → Have you designed a test? → no → DESIGN AN EXPERIMENT
  → Did you update beliefs based on results? → no → REVISE AND ITERATE
  • 调试(系统性原因定位)
  • 性能问题排查
  • A/B测试设计
  • 功能实验
  • 根因分析
  • 数据分析
  • 任何需要验证理论的调查场景
决策流程:
Investigating something?
  → Do you have a clear hypothesis? → no → FORM A HYPOTHESIS
  → Can your hypothesis be proven false? → no → MAKE IT FALSIFIABLE
  → Have you designed a test? → no → DESIGN AN EXPERIMENT
  → Did you update beliefs based on results? → no → REVISE AND ITERATE

The Scientific Method Process

科学方法流程

Step 1: Observe

步骤1:观察

Gather data about the phenomenon:
markdown
undefined
收集现象相关的数据:
markdown
undefined

Observation

Observation

What I'm seeing:
  • API latency increased from 200ms to 800ms
  • Started approximately Monday 9 AM
  • Affects /checkout endpoint
  • Other endpoints are normal
  • Error rate is normal
Initial data:
  • P50: 400ms (was 150ms)
  • P99: 2.5s (was 500ms)
  • Traffic: Normal levels
undefined
What I'm seeing:
  • API latency increased from 200ms to 800ms
  • Started approximately Monday 9 AM
  • Affects /checkout endpoint
  • Other endpoints are normal
  • Error rate is normal
Initial data:
  • P50: 400ms (was 150ms)
  • P99: 2.5s (was 500ms)
  • Traffic: Normal levels
undefined

Step 2: Question

步骤2:提问

What do you want to understand?
markdown
undefined
明确你想要探明的问题:
markdown
undefined

Question

Question

Central question: Why did /checkout latency increase 4x on Monday?
Sub-questions:
  • What changed on/around Monday 9 AM?
  • Why only /checkout and not other endpoints?
  • Why is P99 more affected than P50?
undefined
Central question: Why did /checkout latency increase 4x on Monday?
Sub-questions:
  • What changed on/around Monday 9 AM?
  • Why only /checkout and not other endpoints?
  • Why is P99 more affected than P50?
undefined

Step 3: Hypothesize

步骤3:构建假设

Form a testable explanation:
markdown
undefined
形成可测试的解释:
markdown
undefined

Hypothesis

Hypothesis

Primary hypothesis: "The latency increase is caused by the payment provider SDK update deployed Sunday night, which changed from async to sync API calls."
Why this hypothesis:
  • SDK was updated Sunday (timing matches)
  • /checkout is the only endpoint using payment SDK (scope matches)
  • Sync calls would increase variance (P99 impact matches)

**Good hypothesis characteristics:**
- **Testable:** Can design an experiment
- **Falsifiable:** Can be proven wrong
- **Specific:** Not vague or unfalsifiable
- **Explanatory:** Accounts for observations
Primary hypothesis: "The latency increase is caused by the payment provider SDK update deployed Sunday night, which changed from async to sync API calls."
Why this hypothesis:
  • SDK was updated Sunday (timing matches)
  • /checkout is the only endpoint using payment SDK (scope matches)
  • Sync calls would increase variance (P99 impact matches)

**优质假设的特征:**
- **可测试:** 可以设计对应的实验
- **可证伪:** 可以被证明是错误的
- **明确具体:** 不模糊,不存在不可证伪的表述
- **解释性:** 能够对应已观察到的现象

Step 4: Predict

步骤4:预测

What would you expect IF the hypothesis is true?
markdown
undefined
如果假设成立,你预期会出现什么结果?
markdown
undefined

Predictions

Predictions

If hypothesis is true:
  1. Rolling back the SDK should restore previous latency
  2. Traffic to payment provider should show increased duration
  3. Thread utilization should be higher (blocking calls)
  4. Adding async wrapper should reduce latency
If hypothesis is false:
  1. Rollback won't change latency
  2. Payment provider call duration is unchanged
  3. Thread utilization is normal

**Prediction requirement:**
Predictions must differentiate hypothesis-true from hypothesis-false. If both would produce the same observation, the prediction is useless.
If hypothesis is true:
  1. Rolling back the SDK should restore previous latency
  2. Traffic to payment provider should show increased duration
  3. Thread utilization should be higher (blocking calls)
  4. Adding async wrapper should reduce latency
If hypothesis is false:
  1. Rollback won't change latency
  2. Payment provider call duration is unchanged
  3. Thread utilization is normal

**预测要求:**
预测必须能够区分假设成立和不成立的两种情况。如果两种情况都会产生相同的观察结果,那么这个预测是无效的。

Step 5: Experiment

步骤5:实验

Design and run a test:
markdown
undefined
设计并执行测试:
markdown
undefined

Experiment Design

Experiment Design

Test: Deploy SDK rollback to canary group
Setup:
  • Control: 90% traffic, new SDK
  • Treatment: 10% traffic, old SDK
  • Duration: 1 hour
  • Metric: P50 and P99 latency
Success criteria:
  • If P50 < 200ms in treatment → Hypothesis SUPPORTED
  • If P50 > 350ms in treatment → Hypothesis FALSIFIED
Confounds controlled:
  • Same time of day as original issue
  • Same traffic routing rules
  • Same downstream dependencies
undefined
Test: Deploy SDK rollback to canary group
Setup:
  • Control: 90% traffic, new SDK
  • Treatment: 10% traffic, old SDK
  • Duration: 1 hour
  • Metric: P50 and P99 latency
Success criteria:
  • If P50 < 200ms in treatment → Hypothesis SUPPORTED
  • If P50 > 350ms in treatment → Hypothesis FALSIFIED
Confounds controlled:
  • Same time of day as original issue
  • Same traffic routing rules
  • Same downstream dependencies
undefined

Step 6: Analyze

步骤6:分析

Examine the results:
markdown
undefined
梳理实验结果:
markdown
undefined

Results

Results

Control (new SDK):
  • P50: 410ms
  • P99: 2.4s
  • n: 45,000 requests
Treatment (old SDK):
  • P50: 155ms
  • P99: 480ms
  • n: 5,000 requests
Statistical significance: p < 0.001 Effect size: 62% reduction in P50
Analysis: Hypothesis SUPPORTED. Old SDK shows pre-incident latency levels.
undefined
Control (new SDK):
  • P50: 410ms
  • P99: 2.4s
  • n: 45,000 requests
Treatment (old SDK):
  • P50: 155ms
  • P99: 480ms
  • n: 5,000 requests
Statistical significance: p < 0.001 Effect size: 62% reduction in P50
Analysis: Hypothesis SUPPORTED. Old SDK shows pre-incident latency levels.
undefined

Step 7: Conclude and Iterate

步骤7:总结与迭代

Update beliefs and act:
markdown
undefined
更新认知并采取行动:
markdown
undefined

Conclusion

Conclusion

Finding: Payment SDK update caused latency regression Confidence: High (controlled experiment, clear signal)
Action:
  1. Roll back SDK immediately
  2. File bug with payment provider
  3. Add latency monitoring for SDK calls
  4. Evaluate SDK changes before future updates
Next investigation: Why didn't we catch this in staging? Hypothesis: Staging doesn't have realistic payment provider latency
undefined
Finding: Payment SDK update caused latency regression Confidence: High (controlled experiment, clear signal)
Action:
  1. Roll back SDK immediately
  2. File bug with payment provider
  3. Add latency monitoring for SDK calls
  4. Evaluate SDK changes before future updates
Next investigation: Why didn't we catch this in staging? Hypothesis: Staging doesn't have realistic payment provider latency
undefined

Scientific Debugging

科学调试法

The Debugging Scientific Method

调试场景的科学方法应用

markdown
undefined
markdown
undefined

Bug: Users sometimes see stale data

Bug: Users sometimes see stale data

Observation

Observation

  • Reports of stale data from support tickets
  • No clear pattern in who/when
  • Estimated 5% of users affected
  • Reports of stale data from support tickets
  • No clear pattern in who/when
  • Estimated 5% of users affected

Hypotheses (Multiple)

Hypotheses (Multiple)

#HypothesisFalsification Test
1Cache not invalidatingCheck cache hits with stale data
2Read replica lagCheck replica lag at time of reports
3Browser cachingCheck with cache-busted requests
4CDN serving old contentCheck CDN cache status
#HypothesisFalsification Test
1Cache not invalidatingCheck cache hits with stale data
2Read replica lagCheck replica lag at time of reports
3Browser cachingCheck with cache-busted requests
4CDN serving old contentCheck CDN cache status

Testing Strategy

Testing Strategy

Test in order of: (ease × likelihood)
  1. CDN cache status (easy to check)
  2. Browser caching (easy to check)
  3. Read replica lag (need to correlate times)
  4. Cache invalidation (needs instrumentation)
Test in order of: (ease × likelihood)
  1. CDN cache status (easy to check)
  2. Browser caching (easy to check)
  3. Read replica lag (need to correlate times)
  4. Cache invalidation (needs instrumentation)

Test 1: CDN Cache Status

Test 1: CDN Cache Status

Prediction: If CDN is serving stale content, cache headers will show old timestamps Result: CDN timestamps are fresh Conclusion: CDN ruled out
Prediction: If CDN is serving stale content, cache headers will show old timestamps Result: CDN timestamps are fresh Conclusion: CDN ruled out

Test 2: Browser Caching

Test 2: Browser Caching

Prediction: If browser caching, force-refresh will show correct data Result: Force-refresh still shows stale data sometimes Conclusion: Browser caching ruled out
Prediction: If browser caching, force-refresh will show correct data Result: Force-refresh still shows stale data sometimes Conclusion: Browser caching ruled out

Test 3: Read Replica Lag

Test 3: Read Replica Lag

Prediction: If replica lag, reports will correlate with lag spikes Result: Strong correlation (r=0.84) between reports and lag spikes Conclusion: SUPPORTED - read replica lag is the cause
undefined
Prediction: If replica lag, reports will correlate with lag spikes Result: Strong correlation (r=0.84) between reports and lag spikes Conclusion: SUPPORTED - read replica lag is the cause
undefined

A/B Test Design

A/B测试设计

markdown
undefined
markdown
undefined

A/B Test: New Checkout Flow

A/B Test: New Checkout Flow

Hypothesis

Hypothesis

"The simplified 2-step checkout will increase conversion rate compared to current 4-step checkout."
"The simplified 2-step checkout will increase conversion rate compared to current 4-step checkout."

Predictions

Predictions

If hypothesis is true:
  • Conversion rate increases by >5%
  • Time to complete decreases
  • Abandonment rate decreases
If hypothesis is false:
  • Conversion rate unchanged or decreases
  • Potential confusion (errors increase)
If hypothesis is true:
  • Conversion rate increases by >5%
  • Time to complete decreases
  • Abandonment rate decreases
If hypothesis is false:
  • Conversion rate unchanged or decreases
  • Potential confusion (errors increase)

Experiment Design

Experiment Design

Control: Current 4-step checkout Treatment: New 2-step checkout Traffic split: 50/50 Duration: 2 weeks (for statistical power) Primary metric: Conversion rate Guardrail metrics: Error rate, support tickets
Control: Current 4-step checkout Treatment: New 2-step checkout Traffic split: 50/50 Duration: 2 weeks (for statistical power) Primary metric: Conversion rate Guardrail metrics: Error rate, support tickets

Sample Size Calculation

Sample Size Calculation

Baseline conversion: 3.2% Minimum detectable effect: 5% relative (0.16% absolute) Required n per group: 150,000 users
Baseline conversion: 3.2% Minimum detectable effect: 5% relative (0.16% absolute) Required n per group: 150,000 users

Stopping Criteria

Stopping Criteria

Stop early if:
  • Treatment errors > 2x control
  • Support tickets > 2x baseline
  • p < 0.01 AND effect > 10%
undefined
Stop early if:
  • Treatment errors > 2x control
  • Support tickets > 2x baseline
  • p < 0.01 AND effect > 10%
undefined

Scientific Method Template

科学方法模板

markdown
undefined
markdown
undefined

Scientific Investigation: [Topic]

Scientific Investigation: [Topic]

Observation

Observation

What I'm seeing:
  • [Observation 1]
  • [Observation 2]
Data:
What I'm seeing:
  • [Observation 1]
  • [Observation 2]
Data:

Question

Question

[Central question to answer]
[Central question to answer]

Hypotheses

Hypotheses

#HypothesisHow to TestHow to Falsify
1
2
#HypothesisHow to TestHow to Falsify
1
2

Predictions

Predictions

If H1 is true:
  • [Prediction 1]
  • [Prediction 2]
If H1 is false:
  • [Counter-prediction 1]
If H1 is true:
  • [Prediction 1]
  • [Prediction 2]
If H1 is false:
  • [Counter-prediction 1]

Experiment

Experiment

Design: [How to test] Control: [Baseline] Treatment: [Intervention] Metric: [What to measure] Duration: [How long] Success criteria: [What constitutes support/falsification]
Design: [How to test] Control: [Baseline] Treatment: [Intervention] Metric: [What to measure] Duration: [How long] Success criteria: [What constitutes support/falsification]

Results

Results

[Data from experiment]
[Data from experiment]

Analysis

Analysis

Hypothesis [SUPPORTED/FALSIFIED] Confidence: [High/Medium/Low] Reasoning: [Why]
Hypothesis [SUPPORTED/FALSIFIED] Confidence: [High/Medium/Low] Reasoning: [Why]

Conclusion

Conclusion

Finding: [What I learned] Action: [What to do] Next: [Follow-up investigation]
undefined
Finding: [What I learned] Action: [What to do] Next: [Follow-up investigation]
undefined

Verification Checklist

验证清单

  • Observations documented with data
  • Hypothesis is falsifiable (can be proven wrong)
  • Predictions differentiate true vs. false
  • Experiment controls for confounding variables
  • Results analyzed objectively
  • Conclusion follows from evidence
  • Updated beliefs based on evidence
  • 观察结果已附带数据记录
  • 假设具备可证伪性(可以被证明错误)
  • 预测能够区分假设成立和不成立的场景
  • 实验已控制混淆变量
  • 结果已进行客观分析
  • 结论基于证据推导得出
  • 已根据证据更新认知

Key Questions

关键问题

  • "What would I expect to see if my hypothesis is true?"
  • "What would I expect to see if my hypothesis is false?"
  • "Can this hypothesis be proven wrong?"
  • "Am I testing my hypothesis or confirming my beliefs?"
  • "What's the simplest explanation that fits the data?"
  • "What evidence would change my mind?"
  • "如果我的假设成立,我应该观察到什么现象?"
  • "如果我的假设不成立,我应该观察到什么现象?"
  • "这个假设可以被证明是错误的吗?"
  • "我是在测试假设,还是在试图证实自己的固有想法?"
  • "符合数据的最简单解释是什么?"
  • "什么证据会让我改变想法?"

Feynman's Wisdom

费曼的智慧

"The first principle is that you must not fool yourself—and you are the easiest person to fool."
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
The scientific method protects you from yourself. Your intuition generates hypotheses; the method tests them ruthlessly. When the experiment disagrees with your expectation, the experiment wins.
"第一条原则是你绝不能欺骗自己——而你自己是最容易被欺骗的人。"
"你的理论有多优美并不重要,你有多聪明也不重要。如果它和实验结果不符,那它就是错的。"
科学方法可以帮你避免自我欺骗。你的直觉产出假设,科学方法会对其进行严格测试。当实验结果和你的预期相悖时,以实验结果为准。