stat-hypothesis-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Hypothesis Testing

假设检验

Framework

框架

IRON LAW: Statistical Significance ≠ Practical Significance

A p-value < 0.05 means the result is unlikely under the null hypothesis.
It does NOT mean the result is important, large, or practically meaningful.
With a large enough sample, a 0.1% conversion rate difference becomes
"statistically significant" but is practically worthless.

ALWAYS report effect size alongside p-value.

IRON LAW: State Hypotheses BEFORE Looking at Data

H₀ (null) and H₁ (alternative) must be defined before data analysis.
Choosing hypotheses after seeing the data = p-hacking = scientific fraud.
"We found an interesting pattern, let's test it on the same data" is invalid.

铁律：统计显著性 ≠ 实际显著性

p值 < 0.05 意味着该结果在原假设成立的情况下不太可能出现。
但这并不代表结果重要、影响大或具有实际意义。
当样本量足够大时，0.1%的转化率差异也会变得“统计显著”，但实际上毫无价值。

务必同时报告效应量和p值。

铁律：在查看数据前先明确假设

H₀（原假设）和H₁（备择假设）必须在数据分析前定义好。
查看数据后再选择假设 = p值操纵 = 学术欺诈。
“我们发现了一个有趣的模式，用同一数据来检验它”这种做法是无效的。

Core Concepts

核心概念

Concept	Definition
H₀ (Null)	Default assumption: no effect, no difference
H₁ (Alternative)	What you want to show: there IS an effect/difference
p-value	Probability of seeing this result (or more extreme) IF H₀ is true
α (significance level)	Threshold for rejecting H₀ (typically 0.05)
Type I error (α)	Rejecting H₀ when it's actually true (false positive)
Type II error (β)	Failing to reject H₀ when H₁ is true (false negative)
Power (1-β)	Probability of detecting a real effect (target: ≥ 0.8)
Effect size	Magnitude of the difference (Cohen's d, odds ratio, R²)

概念	定义
H₀ (Null)	默认假设：无效应、无差异
H₁ (Alternative)	你想要验证的假设：存在效应/差异
p-value	若H₀为真，观察到当前结果（或更极端结果）的概率
α (显著性水平)	拒绝H₀的阈值（通常为0.05）
Type I error (α)	当H₀实际为真时却拒绝了它（假阳性）
Type II error (β)	当H₁实际为真时却未拒绝H₀（假阴性）
Power (1-β)	检测到真实效应的概率（目标：≥ 0.8）
Effect size	差异的幅度（如Cohen's d、优势比、R²）

Test Selection Guide

检验方法选择指南

Data Type	Groups	Test
Continuous, normal, 2 groups	Independent	Independent t-test
Continuous, normal, 2 groups	Paired/before-after	Paired t-test
Continuous, normal, 3+ groups	Independent	One-way ANOVA
Continuous, non-normal	2 groups	Mann-Whitney U
Categorical	2+ groups	Chi-square test
Continuous, relationship	2 variables	Pearson correlation (normal) / Spearman (non-normal)
Binary outcome	Predictors	Logistic regression

数据类型	分组情况	检验方法
连续型、正态分布、2组	独立样本	Independent t-test
连续型、正态分布、2组	配对/前后对比	Paired t-test
连续型、正态分布、3+组	独立样本	One-way ANOVA
连续型、非正态分布	2组	Mann-Whitney U
分类数据	2+组	Chi-square test
连续型、变量间关系	2个变量	Pearson相关系数（正态分布）/ Spearman相关系数（非正态分布）
二元结果	预测变量	Logistic regression

Testing Process

检验流程

State hypotheses: H₀ and H₁ with specific parameters
Choose test: Based on data type, distribution, and groups (use guide above)
Set α: Usually 0.05 (justify if different)
Calculate: Run the test, get test statistic and p-value
Decide: p < α → reject H₀; p ≥ α → fail to reject H₀
Report: Effect size + confidence interval + p-value (not just "significant")

明确假设：确定包含具体参数的H₀和H₁
选择检验方法：根据数据类型、分布和分组情况（参考上方指南）
设置α值：通常为0.05（若使用其他值需说明理由）
计算：运行检验，得到检验统计量和p值
决策：p < α → 拒绝H₀；p ≥ α → 不拒绝H₀
报告：效应量 + 置信区间 + p值（不能只说“显著”）

Output Format

输出格式

markdown

undefined

markdown

undefined

Hypothesis Test: {Research Question}

假设检验：{研究问题}

Hypotheses

假设

H₀: {null — no effect/difference}
H₁: {alternative — there IS an effect/difference}
α = {0.05 or other}

H₀: {原假设——无效应/无差异}
H₁: {备择假设——存在效应/差异}
α = {0.05或其他值}

Test Selection

检验方法选择

Test: {name}
Rationale: {why this test fits the data}
Assumptions checked: {normality, independence, equal variance}

检验方法：{名称}
理由：{为何该检验适合当前数据}
已验证的假设：{正态性、独立性、方差齐性}

Results

结果

Test statistic: {value}
p-value: {value}
Effect size: {value and interpretation}
95% CI: [{lower}, {upper}]

检验统计量：{数值}
p值：{数值}
效应量：{数值及解读}
95%置信区间：[{下限}, {上限}]

Decision

决策

{Reject / Fail to reject H₀}

{拒绝 / 不拒绝 H₀}

Interpretation

解读

{What this means in practical terms, with effect size context}

undefined

{结合效应量，说明该结果的实际意义}

undefined

Gotchas

注意事项

"Fail to reject H₀" ≠ "H₀ is true": Absence of evidence is not evidence of absence. You may lack power to detect a real effect.
Multiple comparisons inflate Type I error: Testing 20 hypotheses at α=0.05 → expect 1 false positive by chance. Apply Bonferroni or FDR correction.
Check assumptions before testing: t-test assumes normality and equal variance. Violating assumptions invalidates results. Use non-parametric alternatives when assumptions fail.
Sample size determines power: Small samples miss real effects (Type II error). Calculate required sample size BEFORE collecting data.
p-value is NOT the probability that H₀ is true: It's the probability of the data given H₀. These are fundamentally different things (base rate fallacy).

“不拒绝H₀” ≠ “H₀为真”：没有证据不代表不存在证据。你可能缺乏检测真实效应的效力。
多重比较会增加I型错误概率：在α=0.05的水平下检验20个假设，预计会出现1个假阳性结果。需应用Bonferroni或FDR校正。
检验前先验证假设条件：t检验假设数据服从正态分布且方差齐性。违反假设条件会导致结果无效。当假设不满足时，使用非参数检验替代。
样本量决定检验效力：小样本可能会错过真实效应（II型错误）。在收集数据前先计算所需样本量。
p值并非H₀为真的概率：它是在H₀为真的前提下，观察到当前数据的概率。这两者本质上不同（基础概率谬误）。

References

参考资料

For sample size calculation, see
```
references/sample-size.md
```
For non-parametric test alternatives, see
```
references/nonparametric-tests.md
```

样本量计算请参考
```
references/sample-size.md
```
非参数检验替代方法请参考
```
references/nonparametric-tests.md
```