thinking-scientific-method
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseScientific Method
科学方法
Overview
概述
The scientific method is a systematic approach to understanding through observation, hypothesis formation, prediction, testing, and revision. In engineering, it provides rigor to debugging, experimentation, and investigation. The key insight: good hypotheses must be falsifiable—you must be able to prove them wrong.
Core Principle: Form hypotheses that could be proven false. Design experiments that could falsify them. Update beliefs based on evidence.
科学方法是通过观察、假设构建、预测、测试和修正来实现认知的系统性方法。在工程领域,它为调试、实验和问题排查工作提供了严谨性支撑。核心认知是:优秀的假设必须具备可证伪性——你必须有能力证明它是错误的。
核心原则: 构建可被证伪的假设,设计能够验证其不成立的实验,根据证据更新认知判断。
When to Use
适用场景
- Debugging (systematic cause identification)
- Performance investigation
- A/B test design
- Feature experimentation
- Root cause analysis
- Data analysis
- Any investigation where you're testing theories
Decision flow:
Investigating something?
→ Do you have a clear hypothesis? → no → FORM A HYPOTHESIS
→ Can your hypothesis be proven false? → no → MAKE IT FALSIFIABLE
→ Have you designed a test? → no → DESIGN AN EXPERIMENT
→ Did you update beliefs based on results? → no → REVISE AND ITERATE- 调试(系统性原因定位)
- 性能问题排查
- A/B测试设计
- 功能实验
- 根因分析
- 数据分析
- 任何需要验证理论的调查场景
决策流程:
Investigating something?
→ Do you have a clear hypothesis? → no → FORM A HYPOTHESIS
→ Can your hypothesis be proven false? → no → MAKE IT FALSIFIABLE
→ Have you designed a test? → no → DESIGN AN EXPERIMENT
→ Did you update beliefs based on results? → no → REVISE AND ITERATEThe Scientific Method Process
科学方法流程
Step 1: Observe
步骤1:观察
Gather data about the phenomenon:
markdown
undefined收集现象相关的数据:
markdown
undefinedObservation
Observation
What I'm seeing:
- API latency increased from 200ms to 800ms
- Started approximately Monday 9 AM
- Affects /checkout endpoint
- Other endpoints are normal
- Error rate is normal
Initial data:
- P50: 400ms (was 150ms)
- P99: 2.5s (was 500ms)
- Traffic: Normal levels
undefinedWhat I'm seeing:
- API latency increased from 200ms to 800ms
- Started approximately Monday 9 AM
- Affects /checkout endpoint
- Other endpoints are normal
- Error rate is normal
Initial data:
- P50: 400ms (was 150ms)
- P99: 2.5s (was 500ms)
- Traffic: Normal levels
undefinedStep 2: Question
步骤2:提问
What do you want to understand?
markdown
undefined明确你想要探明的问题:
markdown
undefinedQuestion
Question
Central question: Why did /checkout latency increase 4x on Monday?
Sub-questions:
- What changed on/around Monday 9 AM?
- Why only /checkout and not other endpoints?
- Why is P99 more affected than P50?
undefinedCentral question: Why did /checkout latency increase 4x on Monday?
Sub-questions:
- What changed on/around Monday 9 AM?
- Why only /checkout and not other endpoints?
- Why is P99 more affected than P50?
undefinedStep 3: Hypothesize
步骤3:构建假设
Form a testable explanation:
markdown
undefined形成可测试的解释:
markdown
undefinedHypothesis
Hypothesis
Primary hypothesis:
"The latency increase is caused by the payment provider SDK update
deployed Sunday night, which changed from async to sync API calls."
Why this hypothesis:
- SDK was updated Sunday (timing matches)
- /checkout is the only endpoint using payment SDK (scope matches)
- Sync calls would increase variance (P99 impact matches)
**Good hypothesis characteristics:**
- **Testable:** Can design an experiment
- **Falsifiable:** Can be proven wrong
- **Specific:** Not vague or unfalsifiable
- **Explanatory:** Accounts for observationsPrimary hypothesis:
"The latency increase is caused by the payment provider SDK update
deployed Sunday night, which changed from async to sync API calls."
Why this hypothesis:
- SDK was updated Sunday (timing matches)
- /checkout is the only endpoint using payment SDK (scope matches)
- Sync calls would increase variance (P99 impact matches)
**优质假设的特征:**
- **可测试:** 可以设计对应的实验
- **可证伪:** 可以被证明是错误的
- **明确具体:** 不模糊,不存在不可证伪的表述
- **解释性:** 能够对应已观察到的现象Step 4: Predict
步骤4:预测
What would you expect IF the hypothesis is true?
markdown
undefined如果假设成立,你预期会出现什么结果?
markdown
undefinedPredictions
Predictions
If hypothesis is true:
- Rolling back the SDK should restore previous latency
- Traffic to payment provider should show increased duration
- Thread utilization should be higher (blocking calls)
- Adding async wrapper should reduce latency
If hypothesis is false:
- Rollback won't change latency
- Payment provider call duration is unchanged
- Thread utilization is normal
**Prediction requirement:**
Predictions must differentiate hypothesis-true from hypothesis-false. If both would produce the same observation, the prediction is useless.If hypothesis is true:
- Rolling back the SDK should restore previous latency
- Traffic to payment provider should show increased duration
- Thread utilization should be higher (blocking calls)
- Adding async wrapper should reduce latency
If hypothesis is false:
- Rollback won't change latency
- Payment provider call duration is unchanged
- Thread utilization is normal
**预测要求:**
预测必须能够区分假设成立和不成立的两种情况。如果两种情况都会产生相同的观察结果,那么这个预测是无效的。Step 5: Experiment
步骤5:实验
Design and run a test:
markdown
undefined设计并执行测试:
markdown
undefinedExperiment Design
Experiment Design
Test: Deploy SDK rollback to canary group
Setup:
- Control: 90% traffic, new SDK
- Treatment: 10% traffic, old SDK
- Duration: 1 hour
- Metric: P50 and P99 latency
Success criteria:
- If P50 < 200ms in treatment → Hypothesis SUPPORTED
- If P50 > 350ms in treatment → Hypothesis FALSIFIED
Confounds controlled:
- Same time of day as original issue
- Same traffic routing rules
- Same downstream dependencies
undefinedTest: Deploy SDK rollback to canary group
Setup:
- Control: 90% traffic, new SDK
- Treatment: 10% traffic, old SDK
- Duration: 1 hour
- Metric: P50 and P99 latency
Success criteria:
- If P50 < 200ms in treatment → Hypothesis SUPPORTED
- If P50 > 350ms in treatment → Hypothesis FALSIFIED
Confounds controlled:
- Same time of day as original issue
- Same traffic routing rules
- Same downstream dependencies
undefinedStep 6: Analyze
步骤6:分析
Examine the results:
markdown
undefined梳理实验结果:
markdown
undefinedResults
Results
Control (new SDK):
- P50: 410ms
- P99: 2.4s
- n: 45,000 requests
Treatment (old SDK):
- P50: 155ms
- P99: 480ms
- n: 5,000 requests
Statistical significance: p < 0.001
Effect size: 62% reduction in P50
Analysis:
Hypothesis SUPPORTED. Old SDK shows pre-incident latency levels.
undefinedControl (new SDK):
- P50: 410ms
- P99: 2.4s
- n: 45,000 requests
Treatment (old SDK):
- P50: 155ms
- P99: 480ms
- n: 5,000 requests
Statistical significance: p < 0.001
Effect size: 62% reduction in P50
Analysis:
Hypothesis SUPPORTED. Old SDK shows pre-incident latency levels.
undefinedStep 7: Conclude and Iterate
步骤7:总结与迭代
Update beliefs and act:
markdown
undefined更新认知并采取行动:
markdown
undefinedConclusion
Conclusion
Finding: Payment SDK update caused latency regression
Confidence: High (controlled experiment, clear signal)
Action:
- Roll back SDK immediately
- File bug with payment provider
- Add latency monitoring for SDK calls
- Evaluate SDK changes before future updates
Next investigation:
Why didn't we catch this in staging?
Hypothesis: Staging doesn't have realistic payment provider latency
undefinedFinding: Payment SDK update caused latency regression
Confidence: High (controlled experiment, clear signal)
Action:
- Roll back SDK immediately
- File bug with payment provider
- Add latency monitoring for SDK calls
- Evaluate SDK changes before future updates
Next investigation:
Why didn't we catch this in staging?
Hypothesis: Staging doesn't have realistic payment provider latency
undefinedScientific Debugging
科学调试法
The Debugging Scientific Method
调试场景的科学方法应用
markdown
undefinedmarkdown
undefinedBug: Users sometimes see stale data
Bug: Users sometimes see stale data
Observation
Observation
- Reports of stale data from support tickets
- No clear pattern in who/when
- Estimated 5% of users affected
- Reports of stale data from support tickets
- No clear pattern in who/when
- Estimated 5% of users affected
Hypotheses (Multiple)
Hypotheses (Multiple)
| # | Hypothesis | Falsification Test |
|---|---|---|
| 1 | Cache not invalidating | Check cache hits with stale data |
| 2 | Read replica lag | Check replica lag at time of reports |
| 3 | Browser caching | Check with cache-busted requests |
| 4 | CDN serving old content | Check CDN cache status |
| # | Hypothesis | Falsification Test |
|---|---|---|
| 1 | Cache not invalidating | Check cache hits with stale data |
| 2 | Read replica lag | Check replica lag at time of reports |
| 3 | Browser caching | Check with cache-busted requests |
| 4 | CDN serving old content | Check CDN cache status |
Testing Strategy
Testing Strategy
Test in order of: (ease × likelihood)
- CDN cache status (easy to check)
- Browser caching (easy to check)
- Read replica lag (need to correlate times)
- Cache invalidation (needs instrumentation)
Test in order of: (ease × likelihood)
- CDN cache status (easy to check)
- Browser caching (easy to check)
- Read replica lag (need to correlate times)
- Cache invalidation (needs instrumentation)
Test 1: CDN Cache Status
Test 1: CDN Cache Status
Prediction: If CDN is serving stale content,
cache headers will show old timestamps
Result: CDN timestamps are fresh
Conclusion: CDN ruled out
Prediction: If CDN is serving stale content,
cache headers will show old timestamps
Result: CDN timestamps are fresh
Conclusion: CDN ruled out
Test 2: Browser Caching
Test 2: Browser Caching
Prediction: If browser caching,
force-refresh will show correct data
Result: Force-refresh still shows stale data sometimes
Conclusion: Browser caching ruled out
Prediction: If browser caching,
force-refresh will show correct data
Result: Force-refresh still shows stale data sometimes
Conclusion: Browser caching ruled out
Test 3: Read Replica Lag
Test 3: Read Replica Lag
Prediction: If replica lag,
reports will correlate with lag spikes
Result: Strong correlation (r=0.84) between reports and lag spikes
Conclusion: SUPPORTED - read replica lag is the cause
undefinedPrediction: If replica lag,
reports will correlate with lag spikes
Result: Strong correlation (r=0.84) between reports and lag spikes
Conclusion: SUPPORTED - read replica lag is the cause
undefinedA/B Test Design
A/B测试设计
markdown
undefinedmarkdown
undefinedA/B Test: New Checkout Flow
A/B Test: New Checkout Flow
Hypothesis
Hypothesis
"The simplified 2-step checkout will increase conversion rate
compared to current 4-step checkout."
"The simplified 2-step checkout will increase conversion rate
compared to current 4-step checkout."
Predictions
Predictions
If hypothesis is true:
- Conversion rate increases by >5%
- Time to complete decreases
- Abandonment rate decreases
If hypothesis is false:
- Conversion rate unchanged or decreases
- Potential confusion (errors increase)
If hypothesis is true:
- Conversion rate increases by >5%
- Time to complete decreases
- Abandonment rate decreases
If hypothesis is false:
- Conversion rate unchanged or decreases
- Potential confusion (errors increase)
Experiment Design
Experiment Design
Control: Current 4-step checkout
Treatment: New 2-step checkout
Traffic split: 50/50
Duration: 2 weeks (for statistical power)
Primary metric: Conversion rate
Guardrail metrics: Error rate, support tickets
Control: Current 4-step checkout
Treatment: New 2-step checkout
Traffic split: 50/50
Duration: 2 weeks (for statistical power)
Primary metric: Conversion rate
Guardrail metrics: Error rate, support tickets
Sample Size Calculation
Sample Size Calculation
Baseline conversion: 3.2%
Minimum detectable effect: 5% relative (0.16% absolute)
Required n per group: 150,000 users
Baseline conversion: 3.2%
Minimum detectable effect: 5% relative (0.16% absolute)
Required n per group: 150,000 users
Stopping Criteria
Stopping Criteria
Stop early if:
- Treatment errors > 2x control
- Support tickets > 2x baseline
- p < 0.01 AND effect > 10%
undefinedStop early if:
- Treatment errors > 2x control
- Support tickets > 2x baseline
- p < 0.01 AND effect > 10%
undefinedScientific Method Template
科学方法模板
markdown
undefinedmarkdown
undefinedScientific Investigation: [Topic]
Scientific Investigation: [Topic]
Observation
Observation
What I'm seeing:
- [Observation 1]
- [Observation 2]
Data:
What I'm seeing:
- [Observation 1]
- [Observation 2]
Data:
Question
Question
[Central question to answer]
[Central question to answer]
Hypotheses
Hypotheses
| # | Hypothesis | How to Test | How to Falsify |
|---|---|---|---|
| 1 | |||
| 2 |
| # | Hypothesis | How to Test | How to Falsify |
|---|---|---|---|
| 1 | |||
| 2 |
Predictions
Predictions
If H1 is true:
- [Prediction 1]
- [Prediction 2]
If H1 is false:
- [Counter-prediction 1]
If H1 is true:
- [Prediction 1]
- [Prediction 2]
If H1 is false:
- [Counter-prediction 1]
Experiment
Experiment
Design: [How to test]
Control: [Baseline]
Treatment: [Intervention]
Metric: [What to measure]
Duration: [How long]
Success criteria: [What constitutes support/falsification]
Design: [How to test]
Control: [Baseline]
Treatment: [Intervention]
Metric: [What to measure]
Duration: [How long]
Success criteria: [What constitutes support/falsification]
Results
Results
[Data from experiment]
[Data from experiment]
Analysis
Analysis
Hypothesis [SUPPORTED/FALSIFIED]
Confidence: [High/Medium/Low]
Reasoning: [Why]
Hypothesis [SUPPORTED/FALSIFIED]
Confidence: [High/Medium/Low]
Reasoning: [Why]
Conclusion
Conclusion
Finding: [What I learned]
Action: [What to do]
Next: [Follow-up investigation]
undefinedFinding: [What I learned]
Action: [What to do]
Next: [Follow-up investigation]
undefinedVerification Checklist
验证清单
- Observations documented with data
- Hypothesis is falsifiable (can be proven wrong)
- Predictions differentiate true vs. false
- Experiment controls for confounding variables
- Results analyzed objectively
- Conclusion follows from evidence
- Updated beliefs based on evidence
- 观察结果已附带数据记录
- 假设具备可证伪性(可以被证明错误)
- 预测能够区分假设成立和不成立的场景
- 实验已控制混淆变量
- 结果已进行客观分析
- 结论基于证据推导得出
- 已根据证据更新认知
Key Questions
关键问题
- "What would I expect to see if my hypothesis is true?"
- "What would I expect to see if my hypothesis is false?"
- "Can this hypothesis be proven wrong?"
- "Am I testing my hypothesis or confirming my beliefs?"
- "What's the simplest explanation that fits the data?"
- "What evidence would change my mind?"
- "如果我的假设成立,我应该观察到什么现象?"
- "如果我的假设不成立,我应该观察到什么现象?"
- "这个假设可以被证明是错误的吗?"
- "我是在测试假设,还是在试图证实自己的固有想法?"
- "符合数据的最简单解释是什么?"
- "什么证据会让我改变想法?"
Feynman's Wisdom
费曼的智慧
"The first principle is that you must not fool yourself—and you are the easiest person to fool."
"It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong."
The scientific method protects you from yourself. Your intuition generates hypotheses; the method tests them ruthlessly. When the experiment disagrees with your expectation, the experiment wins.
"第一条原则是你绝不能欺骗自己——而你自己是最容易被欺骗的人。"
"你的理论有多优美并不重要,你有多聪明也不重要。如果它和实验结果不符,那它就是错的。"
科学方法可以帮你避免自我欺骗。你的直觉产出假设,科学方法会对其进行严格测试。当实验结果和你的预期相悖时,以实验结果为准。