Scientific Method

科学方法

Overview

概述

The scientific method is a systematic approach to understanding through observation, hypothesis formation, prediction, testing, and revision. In engineering, it provides rigor to debugging, experimentation, and investigation. The key insight: good hypotheses must be falsifiable—you must be able to prove them wrong.

Core Principle: Form hypotheses that could be proven false. Design experiments that could falsify them. Update beliefs based on evidence.

科学方法是通过观察、假设构建、预测、测试和修正来实现认知的系统性方法。在工程领域，它为调试、实验和问题排查工作提供了严谨性支撑。核心认知是：优秀的假设必须具备可证伪性——你必须有能力证明它是错误的。

核心原则： 构建可被证伪的假设，设计能够验证其不成立的实验，根据证据更新认知判断。

When to Use

适用场景

Debugging (systematic cause identification)
Performance investigation
A/B test design
Feature experimentation
Root cause analysis
Data analysis
Any investigation where you're testing theories

Decision flow:

Investigating something?
  → Do you have a clear hypothesis? → no → FORM A HYPOTHESIS
  → Can your hypothesis be proven false? → no → MAKE IT FALSIFIABLE
  → Have you designed a test? → no → DESIGN AN EXPERIMENT
  → Did you update beliefs based on results? → no → REVISE AND ITERATE

调试（系统性原因定位）
性能问题排查
A/B测试设计
功能实验
根因分析
数据分析
任何需要验证理论的调查场景

决策流程：

Investigating something?
  → Do you have a clear hypothesis? → no → FORM A HYPOTHESIS
  → Can your hypothesis be proven false? → no → MAKE IT FALSIFIABLE
  → Have you designed a test? → no → DESIGN AN EXPERIMENT
  → Did you update beliefs based on results? → no → REVISE AND ITERATE

The Scientific Method Process

科学方法流程

Step 1: Observe

步骤1：观察

Gather data about the phenomenon:

markdown

undefined

收集现象相关的数据：

markdown

undefined

Observation

What I'm seeing:

API latency increased from 200ms to 800ms
Started approximately Monday 9 AM
Affects /checkout endpoint
Other endpoints are normal
Error rate is normal

Initial data:

P50: 400ms (was 150ms)
P99: 2.5s (was 500ms)
Traffic: Normal levels

undefined

What I'm seeing:

API latency increased from 200ms to 800ms
Started approximately Monday 9 AM
Affects /checkout endpoint
Other endpoints are normal
Error rate is normal

Initial data:

P50: 400ms (was 150ms)
P99: 2.5s (was 500ms)
Traffic: Normal levels

undefined

Step 2: Question

步骤2：提问

What do you want to understand?

markdown

undefined

明确你想要探明的问题：

markdown

undefined

Question

Central question: Why did /checkout latency increase 4x on Monday?

Sub-questions:

What changed on/around Monday 9 AM?
Why only /checkout and not other endpoints?
Why is P99 more affected than P50?

undefined

Central question: Why did /checkout latency increase 4x on Monday?

Sub-questions:

What changed on/around Monday 9 AM?
Why only /checkout and not other endpoints?
Why is P99 more affected than P50?

undefined

Step 3: Hypothesize

步骤3：构建假设

Form a testable explanation:

markdown

undefined

形成可测试的解释：

markdown

undefined

Hypothesis

Primary hypothesis: "The latency increase is caused by the payment provider SDK update deployed Sunday night, which changed from async to sync API calls."

Why this hypothesis:

SDK was updated Sunday (timing matches)
/checkout is the only endpoint using payment SDK (scope matches)
Sync calls would increase variance (P99 impact matches)


**Good hypothesis characteristics:**
- **Testable:** Can design an experiment
- **Falsifiable:** Can be proven wrong
- **Specific:** Not vague or unfalsifiable
- **Explanatory:** Accounts for observations

Primary hypothesis: "The latency increase is caused by the payment provider SDK update deployed Sunday night, which changed from async to sync API calls."

Why this hypothesis:

SDK was updated Sunday (timing matches)
/checkout is the only endpoint using payment SDK (scope matches)
Sync calls would increase variance (P99 impact matches)


**优质假设的特征：**
- **可测试：** 可以设计对应的实验
- **可证伪：** 可以被证明是错误的
- **明确具体：** 不模糊，不存在不可证伪的表述
- **解释性：** 能够对应已观察到的现象

Step 4: Predict

步骤4：预测

What would you expect IF the hypothesis is true?

markdown

undefined

如果假设成立，你预期会出现什么结果？

markdown

undefined

Predictions

If hypothesis is true:

Rolling back the SDK should restore previous latency
Traffic to payment provider should show increased duration
Thread utilization should be higher (blocking calls)
Adding async wrapper should reduce latency

If hypothesis is false:

Rollback won't change latency
Payment provider call duration is unchanged
Thread utilization is normal


**Prediction requirement:**
Predictions must differentiate hypothesis-true from hypothesis-false. If both would produce the same observation, the prediction is useless.

If hypothesis is true:

Rolling back the SDK should restore previous latency
Traffic to payment provider should show increased duration
Thread utilization should be higher (blocking calls)
Adding async wrapper should reduce latency

If hypothesis is false:

Rollback won't change latency
Payment provider call duration is unchanged
Thread utilization is normal


**预测要求：**
预测必须能够区分假设成立和不成立的两种情况。如果两种情况都会产生相同的观察结果，那么这个预测是无效的。

Step 5: Experiment

步骤5：实验

Design and run a test:

markdown

undefined

设计并执行测试：

markdown

undefined

Experiment Design

Test: Deploy SDK rollback to canary group

Setup:

Control: 90% traffic, new SDK
Treatment: 10% traffic, old SDK
Duration: 1 hour
Metric: P50 and P99 latency

Success criteria:

If P50 < 200ms in treatment → Hypothesis SUPPORTED
If P50 > 350ms in treatment → Hypothesis FALSIFIED

Confounds controlled:

Same time of day as original issue
Same traffic routing rules
Same downstream dependencies

undefined

Test: Deploy SDK rollback to canary group

Setup:

Control: 90% traffic, new SDK
Treatment: 10% traffic, old SDK
Duration: 1 hour
Metric: P50 and P99 latency

Success criteria:

If P50 < 200ms in treatment → Hypothesis SUPPORTED
If P50 > 350ms in treatment → Hypothesis FALSIFIED

Confounds controlled:

Same time of day as original issue
Same traffic routing rules
Same downstream dependencies

undefined

Step 6: Analyze

步骤6：分析

Examine the results:

markdown

undefined

梳理实验结果：

markdown

undefined

Results

Control (new SDK):

P50: 410ms
P99: 2.4s
n: 45,000 requests

Treatment (old SDK):

P50: 155ms
P99: 480ms
n: 5,000 requests

Statistical significance: p < 0.001 Effect size: 62% reduction in P50

Analysis: Hypothesis SUPPORTED. Old SDK shows pre-incident latency levels.

undefined

Control (new SDK):

P50: 410ms
P99: 2.4s
n: 45,000 requests

Treatment (old SDK):

P50: 155ms
P99: 480ms
n: 5,000 requests

Statistical significance: p < 0.001 Effect size: 62% reduction in P50

Analysis: Hypothesis SUPPORTED. Old SDK shows pre-incident latency levels.

undefined

Step 7: Conclude and Iterate

步骤7：总结与迭代

Update beliefs and act:

markdown

undefined

更新认知并采取行动：

markdown

undefined

Conclusion

Finding: Payment SDK update caused latency regression Confidence: High (controlled experiment, clear signal)

Action:

Roll back SDK immediately
File bug with payment provider
Add latency monitoring for SDK calls
Evaluate SDK changes before future updates

Next investigation: Why didn't we catch this in staging? Hypothesis: Staging doesn't have realistic payment provider latency

undefined

Finding: Payment SDK update caused latency regression Confidence: High (controlled experiment, clear signal)

Action:

Roll back SDK immediately
File bug with payment provider
Add latency monitoring for SDK calls
Evaluate SDK changes before future updates

Next investigation: Why didn't we catch this in staging? Hypothesis: Staging doesn't have realistic payment provider latency

undefined

Scientific Debugging

科学调试法

The Debugging Scientific Method

调试场景的科学方法应用

markdown

undefined

markdown

undefined

Bug: Users sometimes see stale data

Observation

Reports of stale data from support tickets
No clear pattern in who/when
Estimated 5% of users affected

Reports of stale data from support tickets
No clear pattern in who/when
Estimated 5% of users affected

Hypotheses (Multiple)

#	Hypothesis	Falsification Test
1	Cache not invalidating	Check cache hits with stale data
2	Read replica lag	Check replica lag at time of reports
3	Browser caching	Check with cache-busted requests
4	CDN serving old content	Check CDN cache status

#	Hypothesis	Falsification Test
1	Cache not invalidating	Check cache hits with stale data
2	Read replica lag	Check replica lag at time of reports
3	Browser caching	Check with cache-busted requests
4	CDN serving old content	Check CDN cache status

Testing Strategy

Test in order of: (ease × likelihood)

CDN cache status (easy to check)
Browser caching (easy to check)
Read replica lag (need to correlate times)
Cache invalidation (needs instrumentation)

Test in order of: (ease × likelihood)

CDN cache status (easy to check)
Browser caching (easy to check)
Read replica lag (need to correlate times)
Cache invalidation (needs instrumentation)

Test 1: CDN Cache Status

Prediction: If CDN is serving stale content, cache headers will show old timestamps Result: CDN timestamps are fresh Conclusion: CDN ruled out

Test 2: Browser Caching

Prediction: If browser caching, force-refresh will show correct data Result: Force-refresh still shows stale data sometimes Conclusion: Browser caching ruled out

Test 3: Read Replica Lag

Prediction: If replica lag, reports will correlate with lag spikes Result: Strong correlation (r=0.84) between reports and lag spikes Conclusion: SUPPORTED - read replica lag is the cause

undefined

Prediction: If replica lag, reports will correlate with lag spikes Result: Strong correlation (r=0.84) between reports and lag spikes Conclusion: SUPPORTED - read replica lag is the cause

undefined

A/B Test Design

A/B测试设计

markdown

undefined

markdown

undefined

A/B Test: New Checkout Flow

Hypothesis

"The simplified 2-step checkout will increase conversion rate compared to current 4-step checkout."

Predictions

If hypothesis is true:

Conversion rate increases by >5%
Time to complete decreases
Abandonment rate decreases

If hypothesis is false:

Conversion rate unchanged or decreases
Potential confusion (errors increase)

If hypothesis is true:

Conversion rate increases by >5%
Time to complete decreases
Abandonment rate decreases

If hypothesis is false:

Conversion rate unchanged or decreases
Potential confusion (errors increase)

Experiment Design

Control: Current 4-step checkout Treatment: New 2-step checkout Traffic split: 50/50 Duration: 2 weeks (for statistical power) Primary metric: Conversion rate Guardrail metrics: Error rate, support tickets

Sample Size Calculation

Baseline conversion: 3.2% Minimum detectable effect: 5% relative (0.16% absolute) Required n per group: 150,000 users

Stopping Criteria

Stop early if:

Treatment errors > 2x control
Support tickets > 2x baseline
p < 0.01 AND effect > 10%

undefined

Stop early if:

Treatment errors > 2x control
Support tickets > 2x baseline
p < 0.01 AND effect > 10%

undefined

Scientific Method Template

科学方法模板

markdown

undefined

markdown

undefined

Scientific Investigation: [Topic]

Observation

What I'm seeing:

[Observation 1]
[Observation 2]

Data:

What I'm seeing:

[Observation 1]
[Observation 2]

Data:

Question

[Central question to answer]

Hypotheses

#	Hypothesis	How to Test	How to Falsify
1
2

#	Hypothesis	How to Test	How to Falsify
1
2

Predictions

If H1 is true:

[Prediction 1]
[Prediction 2]

If H1 is false:

[Counter-prediction 1]

If H1 is true:

[Prediction 1]
[Prediction 2]

If H1 is false:

[Counter-prediction 1]

Experiment

Design: [How to test] Control: [Baseline] Treatment: [Intervention] Metric: [What to measure] Duration: [How long] Success criteria: [What constitutes support/falsification]

Results

[Data from experiment]

Analysis

Hypothesis [SUPPORTED/FALSIFIED] Confidence: [High/Medium/Low] Reasoning: [Why]

Conclusion

Finding: [What I learned] Action: [What to do] Next: [Follow-up investigation]

undefined

Finding: [What I learned] Action: [What to do] Next: [Follow-up investigation]

undefined

Verification Checklist

验证清单

Observations documented with data
Hypothesis is falsifiable (can be proven wrong)
Predictions differentiate true vs. false
Experiment controls for confounding variables
Results analyzed objectively
Conclusion follows from evidence
Updated beliefs based on evidence

观察结果已附带数据记录
假设具备可证伪性（可以被证明错误）
预测能够区分假设成立和不成立的场景
实验已控制混淆变量
结果已进行客观分析
结论基于证据推导得出
已根据证据更新认知

Key Questions

关键问题

"What would I expect to see if my hypothesis is true?"
"What would I expect to see if my hypothesis is false?"
"Can this hypothesis be proven wrong?"
"Am I testing my hypothesis or confirming my beliefs?"
"What's the simplest explanation that fits the data?"
"What evidence would change my mind?"

"如果我的假设成立，我应该观察到什么现象？"
"如果我的假设不成立，我应该观察到什么现象？"
"这个假设可以被证明是错误的吗？"
"我是在测试假设，还是在试图证实自己的固有想法？"
"符合数据的最简单解释是什么？"
"什么证据会让我改变想法？"

Feynman's Wisdom

费曼的智慧

"The first principle is that you must not fool yourself—and you are the easiest person to fool."

"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."

The scientific method protects you from yourself. Your intuition generates hypotheses; the method tests them ruthlessly. When the experiment disagrees with your expectation, the experiment wins.

"第一条原则是你绝不能欺骗自己——而你自己是最容易被欺骗的人。"

"你的理论有多优美并不重要，你有多聪明也不重要。如果它和实验结果不符，那它就是错的。"

科学方法可以帮你避免自我欺骗。你的直觉产出假设，科学方法会对其进行严格测试。当实验结果和你的预期相悖时，以实验结果为准。

thinking-scientific-method

Original

Translation

Scientific Method

科学方法

Overview

概述

When to Use

适用场景

The Scientific Method Process

科学方法流程

Step 1: Observe

步骤1：观察

Observation

Observation

Step 2: Question

步骤2：提问

Question

Question

Step 3: Hypothesize

步骤3：构建假设

Hypothesis

Hypothesis

Step 4: Predict

步骤4：预测

Predictions

Predictions

Step 5: Experiment

步骤5：实验

Experiment Design

Experiment Design

Step 6: Analyze

步骤6：分析

Results

Results

Step 7: Conclude and Iterate

步骤7：总结与迭代

Conclusion

Conclusion

Scientific Debugging

科学调试法

The Debugging Scientific Method

调试场景的科学方法应用

Bug: Users sometimes see stale data

Bug: Users sometimes see stale data

Observation

Observation

Hypotheses (Multiple)

Hypotheses (Multiple)

Testing Strategy

Testing Strategy

Test 1: CDN Cache Status

Test 1: CDN Cache Status

Test 2: Browser Caching

Test 2: Browser Caching

Test 3: Read Replica Lag

Test 3: Read Replica Lag

A/B Test Design

A/B测试设计

A/B Test: New Checkout Flow

A/B Test: New Checkout Flow

Hypothesis

Hypothesis

Predictions

Predictions

Experiment Design

Experiment Design

Sample Size Calculation

Sample Size Calculation

Stopping Criteria

Stopping Criteria

Scientific Method Template

科学方法模板

Scientific Investigation: [Topic]

Scientific Investigation: [Topic]

Observation

Observation

Question

Question

Hypotheses