Statistical Analyzer

统计分析工具

Guided statistical analysis with hypothesis testing, regression, ANOVA, and plain-English results.

具备引导式统计分析能力，支持假设检验、回归、ANOVA，输出通俗易懂的结果。

Features

功能特性

Hypothesis Testing: t-tests, chi-square, proportion tests
Regression Analysis: Linear, polynomial, multiple regression
ANOVA: One-way, two-way ANOVA with post-hoc tests
Distribution Analysis: Normality tests, Q-Q plots
Correlation Analysis: Pearson, Spearman with significance
Plain-English Results: Interpret statistical outputs
Visualizations: Regression plots, residual analysis, box plots
Report Generation: PDF/HTML reports with interpretations

假设检验：t检验、卡方检验、比例检验
回归分析：线性、多项式、多元回归
ANOVA：单因素、双因素ANOVA及事后检验
分布分析：正态性检验、Q-Q图
相关性分析：带显著性校验的皮尔逊、斯皮尔曼相关分析
通俗易懂的结果解读：对统计输出结果进行自然语言解释
可视化能力：回归图、残差分析、箱线图
报告生成：附带结果解读的PDF/HTML报告

Quick Start

快速开始

python

from statistical_analyzer import StatisticalAnalyzer

analyzer = StatisticalAnalyzer()

python

from statistical_analyzer import StatisticalAnalyzer

analyzer = StatisticalAnalyzer()

T-test

analyzer.load_data(df, group_col='treatment', value_col='score') results = analyzer.t_test(group1='control', group2='experimental') print(results['interpretation'])

Regression

analyzer.load_data(df) results = analyzer.linear_regression(x='age', y='income') print(f"R²: {results['r_squared']}") analyzer.plot_regression('regression.png')

undefined

analyzer.load_data(df) results = analyzer.linear_regression(x='age', y='income') print(f"R²: {results['r_squared']}") analyzer.plot_regression('regression.png')

undefined

CLI Usage

CLI 使用方式

bash

undefined

bash

undefined

T-test

python statistical_analyzer.py --data data.csv --test t-test --group treatment --value score --output results.html

ANOVA

python statistical_analyzer.py --data data.csv --test anova --group category --value score --output results.pdf

Regression

python statistical_analyzer.py --data data.csv --test regression --x age --y income --output report.pdf

Correlation matrix

python statistical_analyzer.py --data data.csv --test correlation --output correlation.png

undefined

python statistical_analyzer.py --data data.csv --test correlation --output correlation.png

undefined

API Reference

API 参考

StatisticalAnalyzer Class

StatisticalAnalyzer 类

python

class StatisticalAnalyzer:
    def __init__(self)

    # Data Loading
    def load_data(self, data, **kwargs) -> 'StatisticalAnalyzer'
    def load_csv(self, filepath, **kwargs) -> 'StatisticalAnalyzer'

    # Hypothesis Tests
    def t_test(self, group1, group2, paired=False, alternative='two-sided') -> Dict
    def one_sample_t_test(self, column, expected_mean, alternative='two-sided') -> Dict
    def anova(self, groups, value_col) -> Dict
    def chi_square(self, observed, expected=None) -> Dict
    def proportion_test(self, successes, total, expected_prop=0.5) -> Dict

    # Regression
    def linear_regression(self, x, y) -> Dict
    def polynomial_regression(self, x, y, degree=2) -> Dict
    def multiple_regression(self, predictors: List[str], target: str) -> Dict

    # Correlation
    def correlation(self, method='pearson') -> pd.DataFrame  # Correlation matrix
    def correlation_test(self, var1, var2, method='pearson') -> Dict

    # Distribution Tests
    def normality_test(self, column, method='shapiro') -> Dict
    def qq_plot(self, column, output=None) -> str

    # Visualization
    def plot_regression(self, output, x=None, y=None) -> str
    def plot_residuals(self, output) -> str
    def plot_distribution(self, column, output) -> str
    def plot_boxplot(self, groups, value_col, output) -> str

    # Reporting
    def generate_report(self, output, format='pdf') -> str
    def summary(self) -> str

python

class StatisticalAnalyzer:
    def __init__(self)

    # Data Loading
    def load_data(self, data, **kwargs) -> 'StatisticalAnalyzer'
    def load_csv(self, filepath, **kwargs) -> 'StatisticalAnalyzer'

    # Hypothesis Tests
    def t_test(self, group1, group2, paired=False, alternative='two-sided') -> Dict
    def one_sample_t_test(self, column, expected_mean, alternative='two-sided') -> Dict
    def anova(self, groups, value_col) -> Dict
    def chi_square(self, observed, expected=None) -> Dict
    def proportion_test(self, successes, total, expected_prop=0.5) -> Dict

    # Regression
    def linear_regression(self, x, y) -> Dict
    def polynomial_regression(self, x, y, degree=2) -> Dict
    def multiple_regression(self, predictors: List[str], target: str) -> Dict

    # Correlation
    def correlation(self, method='pearson') -> pd.DataFrame  # Correlation matrix
    def correlation_test(self, var1, var2, method='pearson') -> Dict

    # Distribution Tests
    def normality_test(self, column, method='shapiro') -> Dict
    def qq_plot(self, column, output=None) -> str

    # Visualization
    def plot_regression(self, output, x=None, y=None) -> str
    def plot_residuals(self, output) -> str
    def plot_distribution(self, column, output) -> str
    def plot_boxplot(self, groups, value_col, output) -> str

    # Reporting
    def generate_report(self, output, format='pdf') -> str
    def summary(self) -> str

Tests

使用示例

T-Test

t检验

Compare means between two groups:

python

analyzer.load_csv('data.csv')

比较两组数据的均值：

python

analyzer.load_csv('data.csv')

Independent samples

独立样本检验

results = analyzer.t_test( group1='control', group2='treatment', paired=False )

Results

结果输出

print(results)

{

'statistic': -2.45,

'p_value': 0.018,

'mean_diff': -5.2,

'ci': (-9.5, -0.9),

'interpretation': 'The difference is statistically significant (p=0.018)...'

}

Paired samples (before/after)

配对样本检验（前后对比）

results = analyzer.t_test( group1='before', group2='after', paired=True )

undefined

results = analyzer.t_test( group1='before', group2='after', paired=True )

undefined

ANOVA

Compare means across multiple groups:

python

results = analyzer.anova(
    groups=['control', 'treatment_a', 'treatment_b'],
    value_col='score'
)

比较多组数据的均值：

python

results = analyzer.anova(
    groups=['control', 'treatment_a', 'treatment_b'],
    value_col='score'
)

Results include post-hoc tests

结果包含事后检验

print(results['interpretation'])

"There is a statistically significant difference between groups (p<0.001).

Post-hoc tests show treatment_a differs from control (p=0.003)..."

undefined

undefined

Regression Analysis

回归分析

python

undefined

python

undefined

Simple linear regression

简单线性回归

results = analyzer.linear_regression(x='hours_studied', y='exam_score')

print(f"R² = {results['r_squared']:.3f}") print(f"Equation: y = {results['slope']:.2f}x + {results['intercept']:.2f}") print(f"p-value: {results['p_value']:.4f}")

results = analyzer.linear_regression(x='hours_studied', y='exam_score')

print(f"R² = {results['r_squared']:.3f}") print(f"Equation: y = {results['slope']:.2f}x + {results['intercept']:.2f}") print(f"p-value: {results['p_value']:.4f}")

Polynomial regression

多项式回归

results = analyzer.polynomial_regression(x='age', y='salary', degree=2)

Multiple regression

多元回归

results = analyzer.multiple_regression( predictors=['age', 'experience', 'education'], target='salary' )

undefined

results = analyzer.multiple_regression( predictors=['age', 'experience', 'education'], target='salary' )

undefined

Correlation Analysis

Full correlation matrix

完整相关矩阵

corr_matrix = analyzer.correlation(method='pearson') print(corr_matrix)

Test specific correlation

特定变量相关性检验

results = analyzer.correlation_test('height', 'weight', method='pearson') print(results['interpretation'])

"There is a strong positive correlation (r=0.82, p<0.001)"

undefined

undefined

Distribution Tests

分布检验

python

undefined

python

undefined

Test normality

正态性检验

results = analyzer.normality_test('scores', method='shapiro')

Returns: {'statistic': 0.98, 'p_value': 0.35,

'interpretation': 'Data appears normally distributed (p=0.35)'}

Q-Q plot

Q-Q图

analyzer.qq_plot('scores', output='qq_plot.png')

undefined

analyzer.qq_plot('scores', output='qq_plot.png')

undefined

Interpretation Guide

解读指南

The analyzer provides plain-English interpretations:

分析工具会提供通俗易懂的结果解读：

Significance Levels

显著性水平

p < 0.001: "Highly significant"
p < 0.01: "Very significant"
p < 0.05: "Statistically significant"
p ≥ 0.05: "Not statistically significant"

p < 0.001："高度显著"
p < 0.01："非常显著"
p < 0.05："统计显著"
p ≥ 0.05："无统计显著性"

Effect Sizes

效应量

Cohen's d: Small (0.2), Medium (0.5), Large (0.8)
R²: Weak (<0.3), Moderate (0.3-0.7), Strong (>0.7)
Correlation: Weak (<0.3), Moderate (0.3-0.7), Strong (>0.7)

Cohen's d：小(0.2)、中(0.5)、大(0.8)
R²：弱(<0.3)、中(0.3-0.7)、强(>0.7)
相关性：弱(<0.3)、中(0.3-0.7)、强(>0.7)

Visualizations

可视化

Regression Plot

回归图

python

analyzer.linear_regression(x='age', y='income')
analyzer.plot_regression('regression.png')

python

analyzer.linear_regression(x='age', y='income')
analyzer.plot_regression('regression.png')

Creates scatter plot with regression line and confidence interval

undefined

undefined

Residual Plot

残差图

python

analyzer.plot_residuals('residuals.png')

python

analyzer.plot_residuals('residuals.png')

Checks regression assumptions (homoscedasticity)

undefined

undefined

Box Plot

箱线图

python

analyzer.plot_boxplot(
    groups=['control', 'treatment_a', 'treatment_b'],
    value_col='score',
    output='boxplot.png'
)

python

analyzer.plot_boxplot(
    groups=['control', 'treatment_a', 'treatment_b'],
    value_col='score',
    output='boxplot.png'
)

Distribution Plot

分布图

python

analyzer.plot_distribution('scores', 'distribution.png')

python

analyzer.plot_distribution('scores', 'distribution.png')

Histogram with normal curve overlay

undefined

undefined

Reports

报告

Generate comprehensive reports:

python

analyzer.load_csv('data.csv')
analyzer.t_test(group1='control', group2='treatment')
analyzer.linear_regression(x='hours', y='score')

生成完整的分析报告：

python

analyzer.load_csv('data.csv')
analyzer.t_test(group1='control', group2='treatment')
analyzer.linear_regression(x='hours', y='score')

PDF report with all analyses

包含所有分析结果的PDF报告

analyzer.generate_report('analysis_report.pdf', format='pdf')

HTML report

HTML报告

analyzer.generate_report('analysis_report.html', format='html')


Reports include:
- Summary statistics
- Test results with interpretations
- Visualizations
- Assumptions checks
- Recommendations

analyzer.generate_report('analysis_report.html', format='html')


报告包含内容：
- 统计摘要
- 带解读的检验结果
- 可视化图表
- 假设校验结果
- 相关建议

Assumptions Checking

假设校验

Automatic assumptions validation:

python

undefined

自动完成假设有效性验证：

python

undefined

T-test checks:

T-test 校验项：

- Normality (Shapiro-Wilk)

- 正态性（Shapiro-Wilk检验）

- Equal variances (Levene's test)

- 方差齐性（Levene检验）

Warnings if assumptions violated

假设不满足时会给出警告

ANOVA checks:

ANOVA 校验项：

- Normality per group

- 各组数据正态性

- Homogeneity of variances

- 方差齐性

Suggests non-parametric alternatives

会建议非参数替代方案

Regression checks:

回归校验项：

- Linearity

- 线性关系

- Homoscedasticity

- 方差齐性

- Normality of residuals

- 残差正态性

- Independence (Durbin-Watson)

- 独立性（Durbin-Watson检验）

undefined

undefined

Dependencies

依赖项

scipy>=1.10.0
statsmodels>=0.14.0
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
reportlab>=4.0.0

scipy>=1.10.0
statsmodels>=0.14.0
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
reportlab>=4.0.0

statistical-analyzer

Original

Translation

Statistical Analyzer

统计分析工具

Features

功能特性

Quick Start

快速开始

T-test

T-test

Regression

Regression

CLI Usage

CLI 使用方式

T-test

T-test

ANOVA

ANOVA

Regression

Regression

Correlation matrix

Correlation matrix

API Reference

API 参考

StatisticalAnalyzer Class

StatisticalAnalyzer 类

Tests

使用示例

T-Test

t检验

Independent samples

独立样本检验

Results

结果输出

{

{

'statistic': -2.45,

'statistic': -2.45,

'p_value': 0.018,

'p_value': 0.018,

'mean_diff': -5.2,

'mean_diff': -5.2,

'ci': (-9.5, -0.9),

'ci': (-9.5, -0.9),

'interpretation': 'The difference is statistically significant (p=0.018)...'

'interpretation': 'The difference is statistically significant (p=0.018)...'

}

}

Paired samples (before/after)

配对样本检验（前后对比）

ANOVA

ANOVA

Results include post-hoc tests

结果包含事后检验

"There is a statistically significant difference between groups (p<0.001).

"There is a statistically significant difference between groups (p<0.001).

Post-hoc tests show treatment_a differs from control (p=0.003)..."

Post-hoc tests show treatment_a differs from control (p=0.003)..."

Regression Analysis

回归分析

Simple linear regression

简单线性回归

Polynomial regression

多项式回归

Multiple regression

多元回归

Correlation Analysis

相关性分析

Full correlation matrix

完整相关矩阵

Test specific correlation

特定变量相关性检验

"There is a strong positive correlation (r=0.82, p<0.001)"

"There is a strong positive correlation (r=0.82, p<0.001)"

Distribution Tests

分布检验

Test normality

正态性检验

Returns: {'statistic': 0.98, 'p_value': 0.35,