statistical-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Statistical Analysis

统计分析

Overview

概述

Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting. Apply this skill for academic research.
统计分析是一个用于检验假设和量化变量关系的系统性过程。本技能支持开展假设检验(t-test、ANOVA、卡方检验)、回归分析、相关性分析、贝叶斯分析,同时包含假设验证和APA格式报告生成功能,适用于学术研究场景。

When to Use This Skill

适用场景

This skill should be used when:
  • Conducting statistical hypothesis tests (t-tests, ANOVA, chi-square)
  • Performing regression or correlation analyses
  • Running Bayesian statistical analyses
  • Checking statistical assumptions and diagnostics
  • Calculating effect sizes and conducting power analyses
  • Reporting statistical results in APA format
  • Analyzing experimental or observational data for research

本技能适用于以下场景:
  • 开展统计假设检验(t-test、ANOVA、卡方检验)
  • 执行回归或相关性分析
  • 运行贝叶斯统计分析
  • 验证统计假设并进行诊断
  • 计算效应量并开展功效分析
  • 以APA格式报告统计结果
  • 分析实验或观测研究数据

Core Capabilities

核心能力

1. Test Selection and Planning

1. 检验方法选择与规划

  • Choose appropriate statistical tests based on research questions and data characteristics
  • Conduct a priori power analyses to determine required sample sizes
  • Plan analysis strategies including multiple comparison corrections
  • 根据研究问题和数据特征选择合适的统计检验方法
  • 开展先验功效分析以确定所需样本量
  • 规划分析策略,包括多重比较校正

2. Assumption Checking

2. 假设验证

  • Automatically verify all relevant assumptions before running tests
  • Provide diagnostic visualizations (Q-Q plots, residual plots, box plots)
  • Recommend remedial actions when assumptions are violated
  • 在运行检验前自动验证所有相关假设
  • 提供诊断可视化图表(Q-Q图、残差图、箱线图)
  • 当假设不满足时推荐补救措施

3. Statistical Testing

3. 统计检验

  • Hypothesis testing: t-tests, ANOVA, chi-square, non-parametric alternatives
  • Regression: linear, multiple, logistic, with diagnostics
  • Correlations: Pearson, Spearman, with confidence intervals
  • Bayesian alternatives: Bayesian t-tests, ANOVA, regression with Bayes Factors
  • 假设检验:t-test、ANOVA、卡方检验及非参数替代方法
  • 回归分析:线性回归、多重回归、逻辑回归及诊断分析
  • 相关性分析:Pearson相关、Spearman相关及置信区间
  • 贝叶斯替代方法:贝叶斯t-test、ANOVA、带贝叶斯因子的回归分析

4. Effect Sizes and Interpretation

4. 效应量与结果解读

  • Calculate and interpret appropriate effect sizes for all analyses
  • Provide confidence intervals for effect estimates
  • Distinguish statistical from practical significance
  • 计算并解读所有分析对应的效应量
  • 提供效应估计的置信区间
  • 区分统计显著性与实际显著性

5. Professional Reporting

5. 专业报告生成

  • Generate APA-style statistical reports
  • Create publication-ready figures and tables
  • Provide complete interpretation with all required statistics

  • 生成APA格式的统计报告
  • 创建可用于发表的图表和表格
  • 提供包含所有必要统计量的完整结果解读

Workflow Decision Tree

工作流决策树

Use this decision tree to determine your analysis path:
START
├─ Need to SELECT a statistical test?
│  └─ YES → See "Test Selection Guide"
│  └─ NO → Continue
├─ Ready to check ASSUMPTIONS?
│  └─ YES → See "Assumption Checking"
│  └─ NO → Continue
├─ Ready to run ANALYSIS?
│  └─ YES → See "Running Statistical Tests"
│  └─ NO → Continue
└─ Need to REPORT results?
   └─ YES → See "Reporting Results"

使用以下决策树确定分析路径:
START
├─ Need to SELECT a statistical test?
│  └─ YES → See "Test Selection Guide"
│  └─ NO → Continue
├─ Ready to check ASSUMPTIONS?
│  └─ YES → See "Assumption Checking"
│  └─ NO → Continue
├─ Ready to run ANALYSIS?
│  └─ YES → See "Running Statistical Tests"
│  └─ NO → Continue
└─ Need to REPORT results?
   └─ YES → See "Reporting Results"

Test Selection Guide

检验方法选择指南

Quick Reference: Choosing the Right Test

快速参考:选择合适的检验方法

Use
references/test_selection_guide.md
for comprehensive guidance. Quick reference:
Comparing Two Groups:
  • Independent, continuous, normal → Independent t-test
  • Independent, continuous, non-normal → Mann-Whitney U test
  • Paired, continuous, normal → Paired t-test
  • Paired, continuous, non-normal → Wilcoxon signed-rank test
  • Binary outcome → Chi-square or Fisher's exact test
Comparing 3+ Groups:
  • Independent, continuous, normal → One-way ANOVA
  • Independent, continuous, non-normal → Kruskal-Wallis test
  • Paired, continuous, normal → Repeated measures ANOVA
  • Paired, continuous, non-normal → Friedman test
Relationships:
  • Two continuous variables → Pearson (normal) or Spearman correlation (non-normal)
  • Continuous outcome with predictor(s) → Linear regression
  • Binary outcome with predictor(s) → Logistic regression
Bayesian Alternatives: All tests have Bayesian versions that provide:
  • Direct probability statements about hypotheses
  • Bayes Factors quantifying evidence
  • Ability to support null hypothesis
  • See
    references/bayesian_statistics.md

请查看
references/test_selection_guide.md
获取全面指导。以下为快速参考:
两组比较:
  • 独立样本、连续型数据、符合正态分布 → 独立样本t-test
  • 独立样本、连续型数据、不符合正态分布 → Mann-Whitney U检验
  • 配对样本、连续型数据、符合正态分布 → 配对样本t-test
  • 配对样本、连续型数据、不符合正态分布 → Wilcoxon符号秩检验
  • 二分类结果变量 → 卡方检验或Fisher精确检验
三组及以上比较:
  • 独立样本、连续型数据、符合正态分布 → 单因素ANOVA
  • 独立样本、连续型数据、不符合正态分布 → Kruskal-Wallis检验
  • 配对样本、连续型数据、符合正态分布 → 重复测量ANOVA
  • 配对样本、连续型数据、不符合正态分布 → Friedman检验
变量关系分析:
  • 两个连续型变量 → Pearson相关(正态分布)或Spearman相关(非正态分布)
  • 连续型因变量与预测变量 → 线性回归
  • 二分类因变量与预测变量 → 逻辑回归
贝叶斯替代方法: 所有检验方法均有对应的贝叶斯版本,可提供:
  • 关于假设的直接概率陈述
  • 量化证据的贝叶斯因子
  • 支持零假设的能力
  • 详情请查看
    references/bayesian_statistics.md

Assumption Checking

假设验证

Systematic Assumption Verification

系统性假设验证

ALWAYS check assumptions before interpreting test results.
Use the provided
scripts/assumption_checks.py
module for automated checking:
python
from scripts.assumption_checks import comprehensive_assumption_check
在解读检验结果前,务必先验证假设。
使用提供的
scripts/assumption_checks.py
模块进行自动化验证:
python
from scripts.assumption_checks import comprehensive_assumption_check

Comprehensive check with visualizations

带可视化的全面检查

results = comprehensive_assumption_check( data=df, value_col='score', group_col='group', # Optional: for group comparisons alpha=0.05 )

This performs:
1. **Outlier detection** (IQR and z-score methods)
2. **Normality testing** (Shapiro-Wilk test + Q-Q plots)
3. **Homogeneity of variance** (Levene's test + box plots)
4. **Interpretation and recommendations**
results = comprehensive_assumption_check( data=df, value_col='score', group_col='group', # 可选:用于组间比较 alpha=0.05 )

该模块将执行:
1. **异常值检测**(IQR和z分数方法)
2. **正态性检验**(Shapiro-Wilk检验 + Q-Q图)
3. **方差齐性检验**(Levene检验 + 箱线图)
4. **结果解读与建议**

Individual Assumption Checks

针对性假设验证

For targeted checks, use individual functions:
python
from scripts.assumption_checks import (
    check_normality,
    check_normality_per_group,
    check_homogeneity_of_variance,
    check_linearity,
    detect_outliers
)
如需针对性检查,可使用独立函数:
python
from scripts.assumption_checks import (
    check_normality,
    check_normality_per_group,
    check_homogeneity_of_variance,
    check_linearity,
    detect_outliers
)

Example: Check normality with visualization

示例:带可视化的正态性检验

result = check_normality( data=df['score'], name='Test Score', alpha=0.05, plot=True ) print(result['interpretation']) print(result['recommendation'])
undefined
result = check_normality( data=df['score'], name='Test Score', alpha=0.05, plot=True ) print(result['interpretation']) print(result['recommendation'])
undefined

What to Do When Assumptions Are Violated

假设不满足时的处理方案

Normality violated:
  • Mild violation + n > 30 per group → Proceed with parametric test (robust)
  • Moderate violation → Use non-parametric alternative
  • Severe violation → Transform data or use non-parametric test
Homogeneity of variance violated:
  • For t-test → Use Welch's t-test
  • For ANOVA → Use Welch's ANOVA or Brown-Forsythe ANOVA
  • For regression → Use robust standard errors or weighted least squares
Linearity violated (regression):
  • Add polynomial terms
  • Transform variables
  • Use non-linear models or GAM
See
references/assumptions_and_diagnostics.md
for comprehensive guidance.

正态性假设不满足:
  • 轻度违反 + 每组样本量n > 30 → 继续使用参数检验(稳健性较好)
  • 中度违反 → 使用非参数替代方法
  • 严重违反 → 转换数据或使用非参数检验
方差齐性假设不满足:
  • t-test → 使用Welch's t-test
  • ANOVA → 使用Welch's ANOVA或Brown-Forsythe ANOVA
  • 回归分析 → 使用稳健标准误或加权最小二乘法
线性假设不满足(回归分析):
  • 添加多项式项
  • 转换变量
  • 使用非线性模型或GAM
详情请查看
references/assumptions_and_diagnostics.md

Running Statistical Tests

运行统计检验

Python Libraries

Python库

Primary libraries for statistical analysis:
  • scipy.stats: Core statistical tests
  • statsmodels: Advanced regression and diagnostics
  • pingouin: User-friendly statistical testing with effect sizes
  • pymc: Bayesian statistical modeling
  • arviz: Bayesian visualization and diagnostics
用于统计分析的核心库:
  • scipy.stats: 基础统计检验
  • statsmodels: 高级回归分析与诊断
  • pingouin: 易用的统计检验工具,支持效应量计算
  • pymc: 贝叶斯统计建模
  • arviz: 贝叶斯分析可视化与诊断

Example Analyses

示例分析

T-Test with Complete Reporting

带完整报告的t-test

python
import pingouin as pg
import numpy as np
python
import pingouin as pg
import numpy as np

Run independent t-test

运行独立样本t-test

result = pg.ttest(group_a, group_b, correction='auto')
result = pg.ttest(group_a, group_b, correction='auto')

Extract results

提取结果

t_stat = result['T'].values[0] df = result['dof'].values[0] p_value = result['p-val'].values[0] cohens_d = result['cohen-d'].values[0] ci_lower = result['CI95%'].values[0][0] ci_upper = result['CI95%'].values[0][1]
t_stat = result['T'].values[0] df = result['dof'].values[0] p_value = result['p-val'].values[0] cohens_d = result['cohen-d'].values[0] ci_lower = result['CI95%'].values[0][0] ci_upper = result['CI95%'].values[0][1]

Report

报告结果

print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}") print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")
undefined
print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}") print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")
undefined

ANOVA with Post-Hoc Tests

带事后检验的ANOVA

python
import pingouin as pg
python
import pingouin as pg

One-way ANOVA

单因素ANOVA

aov = pg.anova(dv='score', between='group', data=df, detailed=True) print(aov)
aov = pg.anova(dv='score', between='group', data=df, detailed=True) print(aov)

If significant, conduct post-hoc tests

若结果显著,执行事后检验

if aov['p-unc'].values[0] < 0.05: posthoc = pg.pairwise_tukey(dv='score', between='group', data=df) print(posthoc)
if aov['p-unc'].values[0] < 0.05: posthoc = pg.pairwise_tukey(dv='score', between='group', data=df) print(posthoc)

Effect size

计算效应量

eta_squared = aov['np2'].values[0] # Partial eta-squared print(f"Partial η² = {eta_squared:.3f}")
undefined
eta_squared = aov['np2'].values[0] # 偏eta平方 print(f"Partial η² = {eta_squared:.3f}")
undefined

Linear Regression with Diagnostics

带诊断分析的线性回归

python
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
python
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor

Fit model

拟合模型

X = sm.add_constant(X_predictors) # Add intercept model = sm.OLS(y, X).fit()
X = sm.add_constant(X_predictors) # 添加截距项 model = sm.OLS(y, X).fit()

Summary

输出模型摘要

print(model.summary())
print(model.summary())

Check multicollinearity (VIF)

检查多重共线性(VIF)

vif_data = pd.DataFrame() vif_data["Variable"] = X.columns vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])] print(vif_data)
vif_data = pd.DataFrame() vif_data["Variable"] = X.columns vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])] print(vif_data)

Check assumptions

验证假设

residuals = model.resid fitted = model.fittedvalues
residuals = model.resid fitted = model.fittedvalues

Residual plots

绘制残差图

import matplotlib.pyplot as plt fig, axes = plt.subplots(2, 2, figsize=(12, 10))
import matplotlib.pyplot as plt fig, axes = plt.subplots(2, 2, figsize=(12, 10))

Residuals vs fitted

残差 vs 拟合值

axes[0, 0].scatter(fitted, residuals, alpha=0.6) axes[0, 0].axhline(y=0, color='r', linestyle='--') axes[0, 0].set_xlabel('Fitted values') axes[0, 0].set_ylabel('Residuals') axes[0, 0].set_title('Residuals vs Fitted')
axes[0, 0].scatter(fitted, residuals, alpha=0.6) axes[0, 0].axhline(y=0, color='r', linestyle='--') axes[0, 0].set_xlabel('Fitted values') axes[0, 0].set_ylabel('Residuals') axes[0, 0].set_title('Residuals vs Fitted')

Q-Q plot

Q-Q图

from scipy import stats stats.probplot(residuals, dist="norm", plot=axes[0, 1]) axes[0, 1].set_title('Normal Q-Q')
from scipy import stats stats.probplot(residuals, dist="norm", plot=axes[0, 1]) axes[0, 1].set_title('Normal Q-Q')

Scale-Location

尺度-位置图

axes[1, 0].scatter(fitted, np.sqrt(np.abs(residuals / residuals.std())), alpha=0.6) axes[1, 0].set_xlabel('Fitted values') axes[1, 0].set_ylabel('√|Standardized residuals|') axes[1, 0].set_title('Scale-Location')
axes[1, 0].scatter(fitted, np.sqrt(np.abs(residuals / residuals.std())), alpha=0.6) axes[1, 0].set_xlabel('Fitted values') axes[1, 0].set_ylabel('√|Standardized residuals|') axes[1, 0].set_title('Scale-Location')

Residuals histogram

残差直方图

axes[1, 1].hist(residuals, bins=20, edgecolor='black', alpha=0.7) axes[1, 1].set_xlabel('Residuals') axes[1, 1].set_ylabel('Frequency') axes[1, 1].set_title('Histogram of Residuals')
plt.tight_layout() plt.show()
undefined
axes[1, 1].hist(residuals, bins=20, edgecolor='black', alpha=0.7) axes[1, 1].set_xlabel('Residuals') axes[1, 1].set_ylabel('Frequency') axes[1, 1].set_title('Histogram of Residuals')
plt.tight_layout() plt.show()
undefined

Bayesian T-Test

贝叶斯t-test

python
import pymc as pm
import arviz as az
import numpy as np

with pm.Model() as model:
    # Priors
    mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
    mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)

    # Likelihood
    y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
    y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)

    # Derived quantity
    diff = pm.Deterministic('difference', mu1 - mu2)

    # Sample
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)
python
import pymc as pm
import arviz as az
import numpy as np

with pm.Model() as model:
    # 先验分布
    mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
    mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)

    # 似然函数
    y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
    y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)

    # 衍生变量
    diff = pm.Deterministic('difference', mu1 - mu2)

    # 采样
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

Summarize

汇总结果

print(az.summary(trace, var_names=['difference']))
print(az.summary(trace, var_names=['difference']))

Probability that group1 > group2

组1均值大于组2均值的概率

prob_greater = np.mean(trace.posterior['difference'].values > 0) print(f"P(μ₁ > μ₂ | data) = {prob_greater:.3f}")
prob_greater = np.mean(trace.posterior['difference'].values > 0) print(f"P(μ₁ > μ₂ | data) = {prob_greater:.3f}")

Plot posterior

绘制后验分布

az.plot_posterior(trace, var_names=['difference'], ref_val=0)

---
az.plot_posterior(trace, var_names=['difference'], ref_val=0)

---

Effect Sizes

效应量

Always Calculate Effect Sizes

务必计算效应量

Effect sizes quantify magnitude, while p-values only indicate existence of an effect.
See
references/effect_sizes_and_power.md
for comprehensive guidance.
效应量用于量化效应的大小,而p值仅能表明效应是否存在。
详情请查看
references/effect_sizes_and_power.md

Quick Reference: Common Effect Sizes

常用效应量快速参考

TestEffect SizeSmallMediumLarge
T-testCohen's d0.200.500.80
ANOVAη²_p0.010.060.14
Correlationr0.100.300.50
Regression0.020.130.26
Chi-squareCramér's V0.070.210.35
Important: Benchmarks are guidelines. Context matters!
检验方法效应量小效应中效应大效应
t-testCohen's d0.200.500.80
ANOVAη²_p0.010.060.14
相关性分析r0.100.300.50
回归分析0.020.130.26
卡方检验Cramér's V0.070.210.35
注意:上述基准仅为参考,实际解读需结合研究场景!

Calculating Effect Sizes

计算效应量

Most effect sizes are automatically calculated by pingouin:
python
undefined
大多数效应量可由pingouin自动计算:
python
undefined

T-test returns Cohen's d

t-test返回Cohen's d

result = pg.ttest(x, y) d = result['cohen-d'].values[0]
result = pg.ttest(x, y) d = result['cohen-d'].values[0]

ANOVA returns partial eta-squared

ANOVA返回偏eta平方

aov = pg.anova(dv='score', between='group', data=df) eta_p2 = aov['np2'].values[0]
aov = pg.anova(dv='score', between='group', data=df) eta_p2 = aov['np2'].values[0]

Correlation: r is already an effect size

相关性分析:r本身就是效应量

corr = pg.corr(x, y) r = corr['r'].values[0]
undefined
corr = pg.corr(x, y) r = corr['r'].values[0]
undefined

Confidence Intervals for Effect Sizes

效应量的置信区间

Always report CIs to show precision:
python
from pingouin import compute_effsize_from_t
务必报告置信区间以体现结果的精度:
python
from pingouin import compute_effsize_from_t

For t-test

针对t-test

d, ci = compute_effsize_from_t( t_statistic, nx=len(group1), ny=len(group2), eftype='cohen' ) print(f"d = {d:.2f}, 95% CI [{ci[0]:.2f}, {ci[1]:.2f}]")

---
d, ci = compute_effsize_from_t( t_statistic, nx=len(group1), ny=len(group2), eftype='cohen' ) print(f"d = {d:.2f}, 95% CI [{ci[0]:.2f}, {ci[1]:.2f}]")

---

Power Analysis

功效分析

A Priori Power Analysis (Study Planning)

先验功效分析(研究规划阶段)

Determine required sample size before data collection:
python
from statsmodels.stats.power import (
    tt_ind_solve_power,
    FTestAnovaPower
)
在数据收集前确定所需样本量:
python
from statsmodels.stats.power import (
    tt_ind_solve_power,
    FTestAnovaPower
)

T-test: What n is needed to detect d = 0.5?

t-test:检测d=0.5的效应需要多少样本量?

n_required = tt_ind_solve_power( effect_size=0.5, alpha=0.05, power=0.80, ratio=1.0, alternative='two-sided' ) print(f"Required n per group: {n_required:.0f}")
n_required = tt_ind_solve_power( effect_size=0.5, alpha=0.05, power=0.80, ratio=1.0, alternative='two-sided' ) print(f"每组所需样本量: {n_required:.0f}")

ANOVA: What n is needed to detect f = 0.25?

ANOVA:检测f=0.25的效应需要多少样本量?

anova_power = FTestAnovaPower() n_per_group = anova_power.solve_power( effect_size=0.25, ngroups=3, alpha=0.05, power=0.80 ) print(f"Required n per group: {n_per_group:.0f}")
undefined
anova_power = FTestAnovaPower() n_per_group = anova_power.solve_power( effect_size=0.25, ngroups=3, alpha=0.05, power=0.80 ) print(f"每组所需样本量: {n_per_group:.0f}")
undefined

Sensitivity Analysis (Post-Study)

敏感性分析(研究完成后)

Determine what effect size you could detect:
python
undefined
确定研究能够检测到的最小效应量:
python
undefined

With n=50 per group, what effect could we detect?

每组样本量n=50时,能够检测到的效应量是多少?

detectable_d = tt_ind_solve_power( effect_size=None, # Solve for this nobs1=50, alpha=0.05, power=0.80, ratio=1.0, alternative='two-sided' ) print(f"Study could detect d ≥ {detectable_d:.2f}")

**Note**: Post-hoc power analysis (calculating power after study) is generally not recommended. Use sensitivity analysis instead.

See `references/effect_sizes_and_power.md` for detailed guidance.

---
detectable_d = tt_ind_solve_power( effect_size=None, # 求解该参数 nobs1=50, alpha=0.05, power=0.80, ratio=1.0, alternative='two-sided' ) print(f"本研究可检测的最小d值: {detectable_d:.2f}")

**注意**:一般不推荐开展事后功效分析(研究完成后计算功效),建议使用敏感性分析替代。

详情请查看`references/effect_sizes_and_power.md`。

---

Reporting Results

结果报告

APA Style Statistical Reporting

APA格式统计报告

Follow guidelines in
references/reporting_standards.md
.
请遵循
references/reporting_standards.md
中的指南。

Essential Reporting Elements

报告核心要素

  1. Descriptive statistics: M, SD, n for all groups/variables
  2. Test statistics: Test name, statistic, df, exact p-value
  3. Effect sizes: With confidence intervals
  4. Assumption checks: Which tests were done, results, actions taken
  5. All planned analyses: Including non-significant findings
  1. 描述性统计:所有组/变量的均值M、标准差SD、样本量n
  2. 检验统计量:检验方法名称、统计量、自由度df、精确p值
  3. 效应量:附带置信区间
  4. 假设验证:执行了哪些检验、结果如何、采取了哪些措施
  5. 所有预设分析:包括不显著的结果

Example Report Templates

报告模板示例

Independent T-Test

独立样本t-test

Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk:
Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity
of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.
组A(n = 48, M = 75.2, SD = 8.5)的得分显著高于
组B(n = 52, M = 68.3, SD = 9.2),t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18],双侧检验。正态性假设(Shapiro-Wilk:
组A W = 0.97, p = .18;组B W = 0.96, p = .12)和方差齐性假设(Levene's F(1, 98) = 1.23, p = .27)均满足。

One-Way ANOVA

单因素ANOVA

A one-way ANOVA revealed a significant main effect of treatment condition
on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc
comparisons using Tukey's HSD indicated that Condition A (M = 78.2,
SD = 7.3) scored significantly higher than Condition B (M = 71.5,
SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9,
p < .001, d = 1.07). Conditions B and C did not differ significantly
(p = .52, d = 0.18).
单因素ANOVA结果显示,处理条件对测试得分存在显著主效应,F(2, 147) = 8.45, p < .001, η²_p = .10。使用Tukey's HSD进行事后比较发现,条件A(M = 78.2,
SD = 7.3)的得分显著高于条件B(M = 71.5, SD = 8.1, p = .002, d = 0.87)和条件C(M = 70.1, SD = 7.9, p < .001, d = 1.07)。条件B和条件C的得分无显著差异(p = .52, d = 0.18)。

Multiple Regression

多重回归分析

Multiple linear regression was conducted to predict exam scores from
study hours, prior GPA, and attendance. The overall model was significant,
F(3, 146) = 45.2, p < .001, R² = .48, adjusted R² = .47. Study hours
(B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42])
and prior GPA (B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001,
95% CI [4.66, 12.38]) were significant predictors, while attendance was
not (B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39]).
Multicollinearity was not a concern (all VIF < 1.5).
采用多重线性回归分析,以学习时长、前期GPA、出勤情况为预测变量,考试得分为因变量。整体模型显著,F(3, 146) = 45.2, p < .001, R² = .48, 调整后R² = .47。学习时长(B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42])
和前期GPA(B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001, 95% CI [4.66, 12.38])是显著预测变量,而出勤情况不是(B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39])。多重共线性无异常(所有VIF < 1.5)。

Bayesian Analysis

贝叶斯分析

A Bayesian independent samples t-test was conducted using weakly
informative priors (Normal(0, 1) for mean difference). The posterior
distribution indicated that Group A scored higher than Group B
(M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor
BF₁₀ = 45.3 provided very strong evidence for a difference between
groups, with a 99.8% posterior probability that Group A's mean exceeded
Group B's mean. Convergence diagnostics were satisfactory (all R̂ < 1.01,
ESS > 1000).

使用弱信息先验(均值差的Normal(0, 1)分布)开展贝叶斯独立样本t-test。后验分布结果显示,组A得分高于组B(M_diff = 6.8, 95%可信区间 [3.2, 10.4])。贝叶斯因子BF₁₀ = 45.3为组间差异提供了极强的证据,组A均值高于组B均值的后验概率为99.8%。收敛诊断结果良好(所有R̂ < 1.01, ESS > 1000)。

Bayesian Statistics

贝叶斯统计

When to Use Bayesian Methods

贝叶斯方法适用场景

Consider Bayesian approaches when:
  • You have prior information to incorporate
  • You want direct probability statements about hypotheses
  • Sample size is small or planning sequential data collection
  • You need to quantify evidence for the null hypothesis
  • The model is complex (hierarchical, missing data)
See
references/bayesian_statistics.md
for comprehensive guidance on:
  • Bayes' theorem and interpretation
  • Prior specification (informative, weakly informative, non-informative)
  • Bayesian hypothesis testing with Bayes Factors
  • Credible intervals vs. confidence intervals
  • Bayesian t-tests, ANOVA, regression, and hierarchical models
  • Model convergence checking and posterior predictive checks
建议在以下场景使用贝叶斯方法:
  • 可纳入先验信息
  • 需要直接获取关于假设的概率陈述
  • 样本量较小或计划开展序贯数据收集
  • 需要量化对零假设的支持程度
  • 模型复杂(分层模型、缺失数据)
详情请查看
references/bayesian_statistics.md
,内容包括:
  • 贝叶斯定理与解读
  • 先验分布指定(信息性、弱信息性、无信息性)
  • 基于贝叶斯因子的假设检验
  • 可信区间与置信区间的对比
  • 贝叶斯t-test、ANOVA、回归分析及分层模型
  • 模型收敛检查与后验预测检查

Key Advantages

核心优势

  1. Intuitive interpretation: "Given the data, there is a 95% probability the parameter is in this interval"
  2. Evidence for null: Can quantify support for no effect
  3. Flexible: No p-hacking concerns; can analyze data as it arrives
  4. Uncertainty quantification: Full posterior distribution

  1. 解读直观:“基于现有数据,参数落在该区间的概率为95%”
  2. 支持零假设:可量化对无效应的支持程度
  3. 灵活性:无需担心p值操纵问题;可实时分析数据
  4. 不确定性量化:提供完整的后验分布

Resources

资源

This skill includes comprehensive reference materials:
本技能包含以下全面参考资料:

References Directory

参考文档目录

  • test_selection_guide.md: Decision tree for choosing appropriate statistical tests
  • assumptions_and_diagnostics.md: Detailed guidance on checking and handling assumption violations
  • effect_sizes_and_power.md: Calculating, interpreting, and reporting effect sizes; conducting power analyses
  • bayesian_statistics.md: Complete guide to Bayesian analysis methods
  • reporting_standards.md: APA-style reporting guidelines with examples
  • test_selection_guide.md: 选择合适统计检验方法的决策树
  • assumptions_and_diagnostics.md: 假设验证与诊断的详细指南
  • effect_sizes_and_power.md: 效应量的计算、解读与报告;功效分析指南
  • bayesian_statistics.md: 贝叶斯分析方法的完整指南
  • reporting_standards.md: APA格式报告指南及示例

Scripts Directory

脚本目录

  • assumption_checks.py: Automated assumption checking with visualizations
    • comprehensive_assumption_check()
      : Complete workflow
    • check_normality()
      : Normality testing with Q-Q plots
    • check_homogeneity_of_variance()
      : Levene's test with box plots
    • check_linearity()
      : Regression linearity checks
    • detect_outliers()
      : IQR and z-score outlier detection

  • assumption_checks.py: 带可视化的自动化假设验证脚本
    • comprehensive_assumption_check()
      : 完整工作流检查
    • check_normality()
      : 带Q-Q图的正态性检验
    • check_homogeneity_of_variance()
      : 带箱线图的方差齐性检验
    • check_linearity()
      : 回归分析线性假设检验
    • detect_outliers()
      : IQR和z分数异常值检测

Best Practices

最佳实践

  1. Pre-register analyses when possible to distinguish confirmatory from exploratory
  2. Always check assumptions before interpreting results
  3. Report effect sizes with confidence intervals
  4. Report all planned analyses including non-significant results
  5. Distinguish statistical from practical significance
  6. Visualize data before and after analysis
  7. Check diagnostics for regression/ANOVA (residual plots, VIF, etc.)
  8. Conduct sensitivity analyses to assess robustness
  9. Share data and code for reproducibility
  10. Be transparent about violations, transformations, and decisions

  1. 预先注册分析方案:尽可能区分验证性分析与探索性分析
  2. 务必验证假设:在解读结果前先完成假设验证
  3. 报告效应量:附带置信区间
  4. 报告所有预设分析:包括不显著的结果
  5. 区分统计显著性与实际显著性
  6. 可视化数据:分析前后均需进行数据可视化
  7. 检查诊断结果:回归/ANOVA的残差图、VIF等
  8. 开展敏感性分析:评估结果的稳健性
  9. 共享数据与代码:确保研究可重复
  10. 保持透明:如实报告假设违反、数据转换及决策过程

Common Pitfalls to Avoid

常见误区

  1. P-hacking: Don't test multiple ways until something is significant
  2. HARKing: Don't present exploratory findings as confirmatory
  3. Ignoring assumptions: Check them and report violations
  4. Confusing significance with importance: p < .05 ≠ meaningful effect
  5. Not reporting effect sizes: Essential for interpretation
  6. Cherry-picking results: Report all planned analyses
  7. Misinterpreting p-values: They're NOT probability that hypothesis is true
  8. Multiple comparisons: Correct for family-wise error when appropriate
  9. Ignoring missing data: Understand mechanism (MCAR, MAR, MNAR)
  10. Overinterpreting non-significant results: Absence of evidence ≠ evidence of absence

  1. p值操纵:不要通过多种检验方法直到得到显著结果
  2. HARKing:不要将探索性结果伪装成验证性结果
  3. 忽略假设验证:务必检查假设并报告违反情况
  4. 混淆显著性与重要性:p < .05 不代表效应有实际意义
  5. 未报告效应量:效应量是结果解读的关键
  6. 选择性报告结果:报告所有预设分析结果
  7. 错误解读p值:p值不是“假设为真的概率”
  8. 多重比较:必要时校正家族式误差
  9. 忽略缺失数据:了解缺失机制(MCAR、MAR、MNAR)
  10. 过度解读不显著结果:没有证据不代表不存在效应

Getting Started Checklist

入门检查清单

When beginning a statistical analysis:
  • Define research question and hypotheses
  • Determine appropriate statistical test (use test_selection_guide.md)
  • Conduct power analysis to determine sample size
  • Load and inspect data
  • Check for missing data and outliers
  • Verify assumptions using assumption_checks.py
  • Run primary analysis
  • Calculate effect sizes with confidence intervals
  • Conduct post-hoc tests if needed (with corrections)
  • Create visualizations
  • Write results following reporting_standards.md
  • Conduct sensitivity analyses
  • Share data and code

开展统计分析前,请完成以下事项:
  • 明确研究问题与假设
  • 确定合适的统计检验方法(参考test_selection_guide.md)
  • 开展功效分析以确定样本量
  • 加载并检查数据
  • 检查缺失数据与异常值
  • 使用assumption_checks.py验证假设
  • 运行核心分析
  • 计算效应量并附带置信区间
  • 必要时执行事后检验(需校正)
  • 创建可视化图表
  • 遵循reporting_standards.md撰写结果
  • 开展敏感性分析
  • 共享数据与代码

Support and Further Reading

支持与拓展阅读

For questions about:
  • Test selection: See references/test_selection_guide.md
  • Assumptions: See references/assumptions_and_diagnostics.md
  • Effect sizes: See references/effect_sizes_and_power.md
  • Bayesian methods: See references/bayesian_statistics.md
  • Reporting: See references/reporting_standards.md
Key textbooks:
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences
  • Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics
  • Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models
  • Kruschke, J. K. (2014). Doing Bayesian Data Analysis
Online resources:
如有以下问题,请参考对应资料:
  • 检验方法选择:查看references/test_selection_guide.md
  • 假设验证:查看references/assumptions_and_diagnostics.md
  • 效应量:查看references/effect_sizes_and_power.md
  • 贝叶斯方法:查看references/bayesian_statistics.md
  • 报告规范:查看references/reporting_standards.md
核心教材:
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences
  • Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics
  • Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models
  • Kruschke, J. K. (2014). Doing Bayesian Data Analysis
在线资源: