stat-hypothesis-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Hypothesis Testing

假设检验

Framework

框架

IRON LAW: Statistical Significance ≠ Practical Significance

A p-value < 0.05 means the result is unlikely under the null hypothesis.
It does NOT mean the result is important, large, or practically meaningful.
With a large enough sample, a 0.1% conversion rate difference becomes
"statistically significant" but is practically worthless.

ALWAYS report effect size alongside p-value.
IRON LAW: State Hypotheses BEFORE Looking at Data

H₀ (null) and H₁ (alternative) must be defined before data analysis.
Choosing hypotheses after seeing the data = p-hacking = scientific fraud.
"We found an interesting pattern, let's test it on the same data" is invalid.
铁律:统计显著性 ≠ 实际显著性

p值 < 0.05 意味着该结果在原假设成立的情况下不太可能出现。
但这并不代表结果重要、影响大或具有实际意义。
当样本量足够大时,0.1%的转化率差异也会变得“统计显著”,但实际上毫无价值。

务必同时报告效应量和p值。
铁律:在查看数据前先明确假设

H₀(原假设)和H₁(备择假设)必须在数据分析前定义好。
查看数据后再选择假设 = p值操纵 = 学术欺诈。
“我们发现了一个有趣的模式,用同一数据来检验它”这种做法是无效的。

Core Concepts

核心概念

ConceptDefinition
H₀ (Null)Default assumption: no effect, no difference
H₁ (Alternative)What you want to show: there IS an effect/difference
p-valueProbability of seeing this result (or more extreme) IF H₀ is true
α (significance level)Threshold for rejecting H₀ (typically 0.05)
Type I error (α)Rejecting H₀ when it's actually true (false positive)
Type II error (β)Failing to reject H₀ when H₁ is true (false negative)
Power (1-β)Probability of detecting a real effect (target: ≥ 0.8)
Effect sizeMagnitude of the difference (Cohen's d, odds ratio, R²)
概念定义
H₀ (Null)默认假设:无效应、无差异
H₁ (Alternative)你想要验证的假设:存在效应/差异
p-value若H₀为真,观察到当前结果(或更极端结果)的概率
α (显著性水平)拒绝H₀的阈值(通常为0.05)
Type I error (α)当H₀实际为真时却拒绝了它(假阳性)
Type II error (β)当H₁实际为真时却未拒绝H₀(假阴性)
Power (1-β)检测到真实效应的概率(目标:≥ 0.8)
Effect size差异的幅度(如Cohen's d、优势比、R²)

Test Selection Guide

检验方法选择指南

Data TypeGroupsTest
Continuous, normal, 2 groupsIndependentIndependent t-test
Continuous, normal, 2 groupsPaired/before-afterPaired t-test
Continuous, normal, 3+ groupsIndependentOne-way ANOVA
Continuous, non-normal2 groupsMann-Whitney U
Categorical2+ groupsChi-square test
Continuous, relationship2 variablesPearson correlation (normal) / Spearman (non-normal)
Binary outcomePredictorsLogistic regression
数据类型分组情况检验方法
连续型、正态分布、2组独立样本Independent t-test
连续型、正态分布、2组配对/前后对比Paired t-test
连续型、正态分布、3+组独立样本One-way ANOVA
连续型、非正态分布2组Mann-Whitney U
分类数据2+组Chi-square test
连续型、变量间关系2个变量Pearson相关系数(正态分布)/ Spearman相关系数(非正态分布)
二元结果预测变量Logistic regression

Testing Process

检验流程

  1. State hypotheses: H₀ and H₁ with specific parameters
  2. Choose test: Based on data type, distribution, and groups (use guide above)
  3. Set α: Usually 0.05 (justify if different)
  4. Calculate: Run the test, get test statistic and p-value
  5. Decide: p < α → reject H₀; p ≥ α → fail to reject H₀
  6. Report: Effect size + confidence interval + p-value (not just "significant")
  1. 明确假设:确定包含具体参数的H₀和H₁
  2. 选择检验方法:根据数据类型、分布和分组情况(参考上方指南)
  3. 设置α值:通常为0.05(若使用其他值需说明理由)
  4. 计算:运行检验,得到检验统计量和p值
  5. 决策:p < α → 拒绝H₀;p ≥ α → 不拒绝H₀
  6. 报告:效应量 + 置信区间 + p值(不能只说“显著”)

Output Format

输出格式

markdown
undefined
markdown
undefined

Hypothesis Test: {Research Question}

假设检验:{研究问题}

Hypotheses

假设

  • H₀: {null — no effect/difference}
  • H₁: {alternative — there IS an effect/difference}
  • α = {0.05 or other}
  • H₀: {原假设——无效应/无差异}
  • H₁: {备择假设——存在效应/差异}
  • α = {0.05或其他值}

Test Selection

检验方法选择

  • Test: {name}
  • Rationale: {why this test fits the data}
  • Assumptions checked: {normality, independence, equal variance}
  • 检验方法:{名称}
  • 理由:{为何该检验适合当前数据}
  • 已验证的假设:{正态性、独立性、方差齐性}

Results

结果

  • Test statistic: {value}
  • p-value: {value}
  • Effect size: {value and interpretation}
  • 95% CI: [{lower}, {upper}]
  • 检验统计量:{数值}
  • p值:{数值}
  • 效应量:{数值及解读}
  • 95%置信区间:[{下限}, {上限}]

Decision

决策

{Reject / Fail to reject H₀}
{拒绝 / 不拒绝 H₀}

Interpretation

解读

{What this means in practical terms, with effect size context}
undefined
{结合效应量,说明该结果的实际意义}
undefined

Gotchas

注意事项

  • "Fail to reject H₀" ≠ "H₀ is true": Absence of evidence is not evidence of absence. You may lack power to detect a real effect.
  • Multiple comparisons inflate Type I error: Testing 20 hypotheses at α=0.05 → expect 1 false positive by chance. Apply Bonferroni or FDR correction.
  • Check assumptions before testing: t-test assumes normality and equal variance. Violating assumptions invalidates results. Use non-parametric alternatives when assumptions fail.
  • Sample size determines power: Small samples miss real effects (Type II error). Calculate required sample size BEFORE collecting data.
  • p-value is NOT the probability that H₀ is true: It's the probability of the data given H₀. These are fundamentally different things (base rate fallacy).
  • “不拒绝H₀” ≠ “H₀为真”:没有证据不代表不存在证据。你可能缺乏检测真实效应的效力。
  • 多重比较会增加I型错误概率:在α=0.05的水平下检验20个假设,预计会出现1个假阳性结果。需应用Bonferroni或FDR校正。
  • 检验前先验证假设条件:t检验假设数据服从正态分布且方差齐性。违反假设条件会导致结果无效。当假设不满足时,使用非参数检验替代。
  • 样本量决定检验效力:小样本可能会错过真实效应(II型错误)。在收集数据前先计算所需样本量。
  • p值并非H₀为真的概率:它是在H₀为真的前提下,观察到当前数据的概率。这两者本质上不同(基础概率谬误)。

References

参考资料

  • For sample size calculation, see
    references/sample-size.md
  • For non-parametric test alternatives, see
    references/nonparametric-tests.md
  • 样本量计算请参考
    references/sample-size.md
  • 非参数检验替代方法请参考
    references/nonparametric-tests.md