stat-hypothesis-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHypothesis Testing
假设检验
Framework
框架
IRON LAW: Statistical Significance ≠ Practical Significance
A p-value < 0.05 means the result is unlikely under the null hypothesis.
It does NOT mean the result is important, large, or practically meaningful.
With a large enough sample, a 0.1% conversion rate difference becomes
"statistically significant" but is practically worthless.
ALWAYS report effect size alongside p-value.IRON LAW: State Hypotheses BEFORE Looking at Data
H₀ (null) and H₁ (alternative) must be defined before data analysis.
Choosing hypotheses after seeing the data = p-hacking = scientific fraud.
"We found an interesting pattern, let's test it on the same data" is invalid.铁律:统计显著性 ≠ 实际显著性
p值 < 0.05 意味着该结果在原假设成立的情况下不太可能出现。
但这并不代表结果重要、影响大或具有实际意义。
当样本量足够大时,0.1%的转化率差异也会变得“统计显著”,但实际上毫无价值。
务必同时报告效应量和p值。铁律:在查看数据前先明确假设
H₀(原假设)和H₁(备择假设)必须在数据分析前定义好。
查看数据后再选择假设 = p值操纵 = 学术欺诈。
“我们发现了一个有趣的模式,用同一数据来检验它”这种做法是无效的。Core Concepts
核心概念
| Concept | Definition |
|---|---|
| H₀ (Null) | Default assumption: no effect, no difference |
| H₁ (Alternative) | What you want to show: there IS an effect/difference |
| p-value | Probability of seeing this result (or more extreme) IF H₀ is true |
| α (significance level) | Threshold for rejecting H₀ (typically 0.05) |
| Type I error (α) | Rejecting H₀ when it's actually true (false positive) |
| Type II error (β) | Failing to reject H₀ when H₁ is true (false negative) |
| Power (1-β) | Probability of detecting a real effect (target: ≥ 0.8) |
| Effect size | Magnitude of the difference (Cohen's d, odds ratio, R²) |
| 概念 | 定义 |
|---|---|
| H₀ (Null) | 默认假设:无效应、无差异 |
| H₁ (Alternative) | 你想要验证的假设:存在效应/差异 |
| p-value | 若H₀为真,观察到当前结果(或更极端结果)的概率 |
| α (显著性水平) | 拒绝H₀的阈值(通常为0.05) |
| Type I error (α) | 当H₀实际为真时却拒绝了它(假阳性) |
| Type II error (β) | 当H₁实际为真时却未拒绝H₀(假阴性) |
| Power (1-β) | 检测到真实效应的概率(目标:≥ 0.8) |
| Effect size | 差异的幅度(如Cohen's d、优势比、R²) |
Test Selection Guide
检验方法选择指南
| Data Type | Groups | Test |
|---|---|---|
| Continuous, normal, 2 groups | Independent | Independent t-test |
| Continuous, normal, 2 groups | Paired/before-after | Paired t-test |
| Continuous, normal, 3+ groups | Independent | One-way ANOVA |
| Continuous, non-normal | 2 groups | Mann-Whitney U |
| Categorical | 2+ groups | Chi-square test |
| Continuous, relationship | 2 variables | Pearson correlation (normal) / Spearman (non-normal) |
| Binary outcome | Predictors | Logistic regression |
| 数据类型 | 分组情况 | 检验方法 |
|---|---|---|
| 连续型、正态分布、2组 | 独立样本 | Independent t-test |
| 连续型、正态分布、2组 | 配对/前后对比 | Paired t-test |
| 连续型、正态分布、3+组 | 独立样本 | One-way ANOVA |
| 连续型、非正态分布 | 2组 | Mann-Whitney U |
| 分类数据 | 2+组 | Chi-square test |
| 连续型、变量间关系 | 2个变量 | Pearson相关系数(正态分布)/ Spearman相关系数(非正态分布) |
| 二元结果 | 预测变量 | Logistic regression |
Testing Process
检验流程
- State hypotheses: H₀ and H₁ with specific parameters
- Choose test: Based on data type, distribution, and groups (use guide above)
- Set α: Usually 0.05 (justify if different)
- Calculate: Run the test, get test statistic and p-value
- Decide: p < α → reject H₀; p ≥ α → fail to reject H₀
- Report: Effect size + confidence interval + p-value (not just "significant")
- 明确假设:确定包含具体参数的H₀和H₁
- 选择检验方法:根据数据类型、分布和分组情况(参考上方指南)
- 设置α值:通常为0.05(若使用其他值需说明理由)
- 计算:运行检验,得到检验统计量和p值
- 决策:p < α → 拒绝H₀;p ≥ α → 不拒绝H₀
- 报告:效应量 + 置信区间 + p值(不能只说“显著”)
Output Format
输出格式
markdown
undefinedmarkdown
undefinedHypothesis Test: {Research Question}
假设检验:{研究问题}
Hypotheses
假设
- H₀: {null — no effect/difference}
- H₁: {alternative — there IS an effect/difference}
- α = {0.05 or other}
- H₀: {原假设——无效应/无差异}
- H₁: {备择假设——存在效应/差异}
- α = {0.05或其他值}
Test Selection
检验方法选择
- Test: {name}
- Rationale: {why this test fits the data}
- Assumptions checked: {normality, independence, equal variance}
- 检验方法:{名称}
- 理由:{为何该检验适合当前数据}
- 已验证的假设:{正态性、独立性、方差齐性}
Results
结果
- Test statistic: {value}
- p-value: {value}
- Effect size: {value and interpretation}
- 95% CI: [{lower}, {upper}]
- 检验统计量:{数值}
- p值:{数值}
- 效应量:{数值及解读}
- 95%置信区间:[{下限}, {上限}]
Decision
决策
{Reject / Fail to reject H₀}
{拒绝 / 不拒绝 H₀}
Interpretation
解读
{What this means in practical terms, with effect size context}
undefined{结合效应量,说明该结果的实际意义}
undefinedGotchas
注意事项
- "Fail to reject H₀" ≠ "H₀ is true": Absence of evidence is not evidence of absence. You may lack power to detect a real effect.
- Multiple comparisons inflate Type I error: Testing 20 hypotheses at α=0.05 → expect 1 false positive by chance. Apply Bonferroni or FDR correction.
- Check assumptions before testing: t-test assumes normality and equal variance. Violating assumptions invalidates results. Use non-parametric alternatives when assumptions fail.
- Sample size determines power: Small samples miss real effects (Type II error). Calculate required sample size BEFORE collecting data.
- p-value is NOT the probability that H₀ is true: It's the probability of the data given H₀. These are fundamentally different things (base rate fallacy).
- “不拒绝H₀” ≠ “H₀为真”:没有证据不代表不存在证据。你可能缺乏检测真实效应的效力。
- 多重比较会增加I型错误概率:在α=0.05的水平下检验20个假设,预计会出现1个假阳性结果。需应用Bonferroni或FDR校正。
- 检验前先验证假设条件:t检验假设数据服从正态分布且方差齐性。违反假设条件会导致结果无效。当假设不满足时,使用非参数检验替代。
- 样本量决定检验效力:小样本可能会错过真实效应(II型错误)。在收集数据前先计算所需样本量。
- p值并非H₀为真的概率:它是在H₀为真的前提下,观察到当前数据的概率。这两者本质上不同(基础概率谬误)。
References
参考资料
- For sample size calculation, see
references/sample-size.md - For non-parametric test alternatives, see
references/nonparametric-tests.md
- 样本量计算请参考
references/sample-size.md - 非参数检验替代方法请参考
references/nonparametric-tests.md