statistical-analyzer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStatistical Analyzer
统计分析工具
Guided statistical analysis with hypothesis testing, regression, ANOVA, and plain-English results.
具备引导式统计分析能力,支持假设检验、回归、ANOVA,输出通俗易懂的结果。
Features
功能特性
- Hypothesis Testing: t-tests, chi-square, proportion tests
- Regression Analysis: Linear, polynomial, multiple regression
- ANOVA: One-way, two-way ANOVA with post-hoc tests
- Distribution Analysis: Normality tests, Q-Q plots
- Correlation Analysis: Pearson, Spearman with significance
- Plain-English Results: Interpret statistical outputs
- Visualizations: Regression plots, residual analysis, box plots
- Report Generation: PDF/HTML reports with interpretations
- 假设检验:t检验、卡方检验、比例检验
- 回归分析:线性、多项式、多元回归
- ANOVA:单因素、双因素ANOVA及事后检验
- 分布分析:正态性检验、Q-Q图
- 相关性分析:带显著性校验的皮尔逊、斯皮尔曼相关分析
- 通俗易懂的结果解读:对统计输出结果进行自然语言解释
- 可视化能力:回归图、残差分析、箱线图
- 报告生成:附带结果解读的PDF/HTML报告
Quick Start
快速开始
python
from statistical_analyzer import StatisticalAnalyzer
analyzer = StatisticalAnalyzer()python
from statistical_analyzer import StatisticalAnalyzer
analyzer = StatisticalAnalyzer()T-test
T-test
analyzer.load_data(df, group_col='treatment', value_col='score')
results = analyzer.t_test(group1='control', group2='experimental')
print(results['interpretation'])
analyzer.load_data(df, group_col='treatment', value_col='score')
results = analyzer.t_test(group1='control', group2='experimental')
print(results['interpretation'])
Regression
Regression
analyzer.load_data(df)
results = analyzer.linear_regression(x='age', y='income')
print(f"R²: {results['r_squared']}")
analyzer.plot_regression('regression.png')
undefinedanalyzer.load_data(df)
results = analyzer.linear_regression(x='age', y='income')
print(f"R²: {results['r_squared']}")
analyzer.plot_regression('regression.png')
undefinedCLI Usage
CLI 使用方式
bash
undefinedbash
undefinedT-test
T-test
python statistical_analyzer.py --data data.csv --test t-test --group treatment --value score --output results.html
python statistical_analyzer.py --data data.csv --test t-test --group treatment --value score --output results.html
ANOVA
ANOVA
python statistical_analyzer.py --data data.csv --test anova --group category --value score --output results.pdf
python statistical_analyzer.py --data data.csv --test anova --group category --value score --output results.pdf
Regression
Regression
python statistical_analyzer.py --data data.csv --test regression --x age --y income --output report.pdf
python statistical_analyzer.py --data data.csv --test regression --x age --y income --output report.pdf
Correlation matrix
Correlation matrix
python statistical_analyzer.py --data data.csv --test correlation --output correlation.png
undefinedpython statistical_analyzer.py --data data.csv --test correlation --output correlation.png
undefinedAPI Reference
API 参考
StatisticalAnalyzer Class
StatisticalAnalyzer 类
python
class StatisticalAnalyzer:
def __init__(self)
# Data Loading
def load_data(self, data, **kwargs) -> 'StatisticalAnalyzer'
def load_csv(self, filepath, **kwargs) -> 'StatisticalAnalyzer'
# Hypothesis Tests
def t_test(self, group1, group2, paired=False, alternative='two-sided') -> Dict
def one_sample_t_test(self, column, expected_mean, alternative='two-sided') -> Dict
def anova(self, groups, value_col) -> Dict
def chi_square(self, observed, expected=None) -> Dict
def proportion_test(self, successes, total, expected_prop=0.5) -> Dict
# Regression
def linear_regression(self, x, y) -> Dict
def polynomial_regression(self, x, y, degree=2) -> Dict
def multiple_regression(self, predictors: List[str], target: str) -> Dict
# Correlation
def correlation(self, method='pearson') -> pd.DataFrame # Correlation matrix
def correlation_test(self, var1, var2, method='pearson') -> Dict
# Distribution Tests
def normality_test(self, column, method='shapiro') -> Dict
def qq_plot(self, column, output=None) -> str
# Visualization
def plot_regression(self, output, x=None, y=None) -> str
def plot_residuals(self, output) -> str
def plot_distribution(self, column, output) -> str
def plot_boxplot(self, groups, value_col, output) -> str
# Reporting
def generate_report(self, output, format='pdf') -> str
def summary(self) -> strpython
class StatisticalAnalyzer:
def __init__(self)
# Data Loading
def load_data(self, data, **kwargs) -> 'StatisticalAnalyzer'
def load_csv(self, filepath, **kwargs) -> 'StatisticalAnalyzer'
# Hypothesis Tests
def t_test(self, group1, group2, paired=False, alternative='two-sided') -> Dict
def one_sample_t_test(self, column, expected_mean, alternative='two-sided') -> Dict
def anova(self, groups, value_col) -> Dict
def chi_square(self, observed, expected=None) -> Dict
def proportion_test(self, successes, total, expected_prop=0.5) -> Dict
# Regression
def linear_regression(self, x, y) -> Dict
def polynomial_regression(self, x, y, degree=2) -> Dict
def multiple_regression(self, predictors: List[str], target: str) -> Dict
# Correlation
def correlation(self, method='pearson') -> pd.DataFrame # Correlation matrix
def correlation_test(self, var1, var2, method='pearson') -> Dict
# Distribution Tests
def normality_test(self, column, method='shapiro') -> Dict
def qq_plot(self, column, output=None) -> str
# Visualization
def plot_regression(self, output, x=None, y=None) -> str
def plot_residuals(self, output) -> str
def plot_distribution(self, column, output) -> str
def plot_boxplot(self, groups, value_col, output) -> str
# Reporting
def generate_report(self, output, format='pdf') -> str
def summary(self) -> strTests
使用示例
T-Test
t检验
Compare means between two groups:
python
analyzer.load_csv('data.csv')比较两组数据的均值:
python
analyzer.load_csv('data.csv')Independent samples
独立样本检验
results = analyzer.t_test(
group1='control',
group2='treatment',
paired=False
)
results = analyzer.t_test(
group1='control',
group2='treatment',
paired=False
)
Results
结果输出
print(results)
print(results)
{
{
'statistic': -2.45,
'statistic': -2.45,
'p_value': 0.018,
'p_value': 0.018,
'mean_diff': -5.2,
'mean_diff': -5.2,
'ci': (-9.5, -0.9),
'ci': (-9.5, -0.9),
'interpretation': 'The difference is statistically significant (p=0.018)...'
'interpretation': 'The difference is statistically significant (p=0.018)...'
}
}
Paired samples (before/after)
配对样本检验(前后对比)
results = analyzer.t_test(
group1='before',
group2='after',
paired=True
)
undefinedresults = analyzer.t_test(
group1='before',
group2='after',
paired=True
)
undefinedANOVA
ANOVA
Compare means across multiple groups:
python
results = analyzer.anova(
groups=['control', 'treatment_a', 'treatment_b'],
value_col='score'
)比较多组数据的均值:
python
results = analyzer.anova(
groups=['control', 'treatment_a', 'treatment_b'],
value_col='score'
)Results include post-hoc tests
结果包含事后检验
print(results['interpretation'])
print(results['interpretation'])
"There is a statistically significant difference between groups (p<0.001).
"There is a statistically significant difference between groups (p<0.001).
Post-hoc tests show treatment_a differs from control (p=0.003)..."
Post-hoc tests show treatment_a differs from control (p=0.003)..."
undefinedundefinedRegression Analysis
回归分析
python
undefinedpython
undefinedSimple linear regression
简单线性回归
results = analyzer.linear_regression(x='hours_studied', y='exam_score')
print(f"R² = {results['r_squared']:.3f}")
print(f"Equation: y = {results['slope']:.2f}x + {results['intercept']:.2f}")
print(f"p-value: {results['p_value']:.4f}")
results = analyzer.linear_regression(x='hours_studied', y='exam_score')
print(f"R² = {results['r_squared']:.3f}")
print(f"Equation: y = {results['slope']:.2f}x + {results['intercept']:.2f}")
print(f"p-value: {results['p_value']:.4f}")
Polynomial regression
多项式回归
results = analyzer.polynomial_regression(x='age', y='salary', degree=2)
results = analyzer.polynomial_regression(x='age', y='salary', degree=2)
Multiple regression
多元回归
results = analyzer.multiple_regression(
predictors=['age', 'experience', 'education'],
target='salary'
)
undefinedresults = analyzer.multiple_regression(
predictors=['age', 'experience', 'education'],
target='salary'
)
undefinedCorrelation Analysis
相关性分析
python
undefinedpython
undefinedFull correlation matrix
完整相关矩阵
corr_matrix = analyzer.correlation(method='pearson')
print(corr_matrix)
corr_matrix = analyzer.correlation(method='pearson')
print(corr_matrix)
Test specific correlation
特定变量相关性检验
results = analyzer.correlation_test('height', 'weight', method='pearson')
print(results['interpretation'])
results = analyzer.correlation_test('height', 'weight', method='pearson')
print(results['interpretation'])
"There is a strong positive correlation (r=0.82, p<0.001)"
"There is a strong positive correlation (r=0.82, p<0.001)"
undefinedundefinedDistribution Tests
分布检验
python
undefinedpython
undefinedTest normality
正态性检验
results = analyzer.normality_test('scores', method='shapiro')
results = analyzer.normality_test('scores', method='shapiro')
Returns: {'statistic': 0.98, 'p_value': 0.35,
Returns: {'statistic': 0.98, 'p_value': 0.35,
'interpretation': 'Data appears normally distributed (p=0.35)'}
'interpretation': 'Data appears normally distributed (p=0.35)'}
Q-Q plot
Q-Q图
analyzer.qq_plot('scores', output='qq_plot.png')
undefinedanalyzer.qq_plot('scores', output='qq_plot.png')
undefinedInterpretation Guide
解读指南
The analyzer provides plain-English interpretations:
分析工具会提供通俗易懂的结果解读:
Significance Levels
显著性水平
- p < 0.001: "Highly significant"
- p < 0.01: "Very significant"
- p < 0.05: "Statistically significant"
- p ≥ 0.05: "Not statistically significant"
- p < 0.001:"高度显著"
- p < 0.01:"非常显著"
- p < 0.05:"统计显著"
- p ≥ 0.05:"无统计显著性"
Effect Sizes
效应量
- Cohen's d: Small (0.2), Medium (0.5), Large (0.8)
- R²: Weak (<0.3), Moderate (0.3-0.7), Strong (>0.7)
- Correlation: Weak (<0.3), Moderate (0.3-0.7), Strong (>0.7)
- Cohen's d:小(0.2)、中(0.5)、大(0.8)
- R²:弱(<0.3)、中(0.3-0.7)、强(>0.7)
- 相关性:弱(<0.3)、中(0.3-0.7)、强(>0.7)
Visualizations
可视化
Regression Plot
回归图
python
analyzer.linear_regression(x='age', y='income')
analyzer.plot_regression('regression.png')python
analyzer.linear_regression(x='age', y='income')
analyzer.plot_regression('regression.png')Creates scatter plot with regression line and confidence interval
Creates scatter plot with regression line and confidence interval
undefinedundefinedResidual Plot
残差图
python
analyzer.plot_residuals('residuals.png')python
analyzer.plot_residuals('residuals.png')Checks regression assumptions (homoscedasticity)
Checks regression assumptions (homoscedasticity)
undefinedundefinedBox Plot
箱线图
python
analyzer.plot_boxplot(
groups=['control', 'treatment_a', 'treatment_b'],
value_col='score',
output='boxplot.png'
)python
analyzer.plot_boxplot(
groups=['control', 'treatment_a', 'treatment_b'],
value_col='score',
output='boxplot.png'
)Distribution Plot
分布图
python
analyzer.plot_distribution('scores', 'distribution.png')python
analyzer.plot_distribution('scores', 'distribution.png')Histogram with normal curve overlay
Histogram with normal curve overlay
undefinedundefinedReports
报告
Generate comprehensive reports:
python
analyzer.load_csv('data.csv')
analyzer.t_test(group1='control', group2='treatment')
analyzer.linear_regression(x='hours', y='score')生成完整的分析报告:
python
analyzer.load_csv('data.csv')
analyzer.t_test(group1='control', group2='treatment')
analyzer.linear_regression(x='hours', y='score')PDF report with all analyses
包含所有分析结果的PDF报告
analyzer.generate_report('analysis_report.pdf', format='pdf')
analyzer.generate_report('analysis_report.pdf', format='pdf')
HTML report
HTML报告
analyzer.generate_report('analysis_report.html', format='html')
Reports include:
- Summary statistics
- Test results with interpretations
- Visualizations
- Assumptions checks
- Recommendationsanalyzer.generate_report('analysis_report.html', format='html')
报告包含内容:
- 统计摘要
- 带解读的检验结果
- 可视化图表
- 假设校验结果
- 相关建议Assumptions Checking
假设校验
Automatic assumptions validation:
python
undefined自动完成假设有效性验证:
python
undefinedT-test checks:
T-test 校验项:
- Normality (Shapiro-Wilk)
- 正态性(Shapiro-Wilk检验)
- Equal variances (Levene's test)
- 方差齐性(Levene检验)
Warnings if assumptions violated
假设不满足时会给出警告
ANOVA checks:
ANOVA 校验项:
- Normality per group
- 各组数据正态性
- Homogeneity of variances
- 方差齐性
Suggests non-parametric alternatives
会建议非参数替代方案
Regression checks:
回归校验项:
- Linearity
- 线性关系
- Homoscedasticity
- 方差齐性
- Normality of residuals
- 残差正态性
- Independence (Durbin-Watson)
- 独立性(Durbin-Watson检验)
undefinedundefinedDependencies
依赖项
- scipy>=1.10.0
- statsmodels>=0.14.0
- pandas>=2.0.0
- numpy>=1.24.0
- matplotlib>=3.7.0
- seaborn>=0.12.0
- reportlab>=4.0.0
- scipy>=1.10.0
- statsmodels>=0.14.0
- pandas>=2.0.0
- numpy>=1.24.0
- matplotlib>=3.7.0
- seaborn>=0.12.0
- reportlab>=4.0.0