ab-testing-analyzer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAB测试分析技能 (AB Testing Analyzer)
AB Testing Analyzer
一个功能完整的智能AB测试分析工具,基于"数据分析咖哥十话"的AB测试模块开发。
A fully-featured intelligent AB testing analysis tool developed based on the AB testing module of "Data Analyst Brother's Ten Talks".
🎯 技能概述
🎯 Skill Overview
本技能提供从实验设计到结果分析的完整AB测试解决方案,支持多种统计检验方法、用户分群分析和可视化报告生成。
This skill provides a complete AB testing solution from experimental design to result analysis, supporting multiple statistical testing methods, user segmentation analysis, and visual report generation.
✨ 核心特性
✨ Core Features
-
🧪 完整的AB测试流程
- 实验设计和样本量计算
- 随机分组验证
- 转化率和留存率分析
- 统计显著性检验
-
📊 全面的统计方法
- t检验 (独立样本、配对样本)
- 卡方检验 (拟合优度、独立性)
- 置信区间估计
- 效应量计算
- 多重比较校正
-
👥 智能用户分群
- 价值分群 (高/低价值客户)
- 人口统计学分群
- 行为模式分群
- 自定义分群策略
- 交互效应分析
-
📈 丰富的可视化功能
- 转化率对比图
- 留存率曲线图
- 用户分群热力图
- 交互效应可视化
- 统计检验结果图
-
🔧 高级分析功能
- 多变量AB测试
- 贝叶斯AB测试
- 时间序列分析
- 稳健性检查
- 因果推断支持
-
🧪 Complete AB Testing Workflow
- Experimental design and sample size calculation
- Random group verification
- Conversion rate and retention rate analysis
- Statistical significance testing
-
📊 Comprehensive Statistical Methods
- t-test (independent samples, paired samples)
- Chi-square test (goodness of fit, independence)
- Confidence interval estimation
- Effect size calculation
- Multiple comparison correction
-
👥 Intelligent User Segmentation
- Value segmentation (high/low value customers)
- Demographic segmentation
- Behavioral pattern segmentation
- Custom segmentation strategies
- Interaction effect analysis
-
📈 Rich Visualization Features
- Conversion rate comparison chart
- Retention rate curve chart
- User segmentation heatmap
- Interaction effect visualization
- Statistical test result chart
-
🔧 Advanced Analysis Features
- Multivariate AB testing
- Bayesian AB testing
- Time series analysis
- Robustness check
- Causal inference support
🚀 快速开始
🚀 Quick Start
1. 环境要求
1. Environment Requirements
bash
undefinedbash
undefined依赖包
Dependencies
pip install pandas numpy scipy matplotlib seaborn statsmodels
undefinedpip install pandas numpy scipy matplotlib seaborn statsmodels
undefined2. 基础使用
2. Basic Usage
python
from scripts.ab_test_analyzer import ABTestAnalyzer
from scripts.statistical_tests import StatisticalTests
from scripts.visualizer import ABTestVisualizerpython
from scripts.ab_test_analyzer import ABTestAnalyzer
from scripts.statistical_tests import StatisticalTests
from scripts.visualizer import ABTestVisualizer初始化分析器
Initialize analyzer
analyzer = ABTestAnalyzer()
stats_tests = StatisticalTests()
visualizer = ABTestVisualizer()
analyzer = ABTestAnalyzer()
stats_tests = StatisticalTests()
visualizer = ABTestVisualizer()
加载AB测试数据
Load AB test data
data = analyzer.load_data('ab_test_data.csv')
data = analyzer.load_data('ab_test_data.csv')
基础转化率分析
Basic conversion rate analysis
conversion_results = analyzer.analyze_conversion(
data,
group_col='页面版本',
conversion_col='是否购买'
)
conversion_results = analyzer.analyze_conversion(
data,
group_col='Page Version',
conversion_col='Purchased'
)
统计显著性检验
Statistical significance testing
t_test_result = stats_tests.t_test(
data,
group_col='页面版本',
metric_col='是否购买'
)
t_test_result = stats_tests.t_test(
data,
group_col='Page Version',
metric_col='Purchased'
)
生成可视化报告
Generate visualization report
fig = visualizer.plot_conversion_comparison(conversion_results)
undefinedfig = visualizer.plot_conversion_comparison(conversion_results)
undefined3. 运行示例
3. Run Examples
bash
undefinedbash
undefined快速测试
Quick test
python quick_test.py
python quick_test.py
基础AB测试示例
Basic AB test example
python examples/basic_ab_test_example.py
python examples/basic_ab_test_example.py
高级分群分析示例
Advanced segmentation analysis example
python examples/advanced_segmentation_example.py
python examples/advanced_segmentation_example.py
综合分析示例
Comprehensive analysis example
python examples/comprehensive_analysis_example.py
undefinedpython examples/comprehensive_analysis_example.py
undefined📁 项目结构
📁 Project Structure
ab-testing-analyzer/
├── SKILL.md # 技能详细文档
├── README.md # 使用指南 (本文件)
├── quick_test.py # 快速功能测试
├── test_skill.py # 完整测试套件
│
├── scripts/ # 核心功能模块
│ ├── __init__.py
│ ├── ab_test_analyzer.py # AB测试核心分析
│ ├── statistical_tests.py # 统计检验模块
│ ├── segment_analyzer.py # 用户分群分析
│ └── visualizer.py # 可视化生成器
│
└── examples/ # 示例和数据
├── sample_data/ # 样本数据
│ ├── sample_ab_test_data.csv
│ └── sample_user_segments.csv
├── basic_ab_test_example.py # 基础AB测试示例
├── advanced_segmentation_example.py # 高级分群分析示例
└── comprehensive_analysis_example.py # 综合分析示例ab-testing-analyzer/
├── SKILL.md # Detailed skill documentation
├── README.md # User guide (this file)
├── quick_test.py # Quick function test
├── test_skill.py # Complete test suite
│
├── scripts/ # Core function modules
│ ├── __init__.py
│ ├── ab_test_analyzer.py # Core AB test analysis
│ ├── statistical_tests.py # Statistical testing module
│ ├── segment_analyzer.py # User segmentation analysis
│ └── visualizer.py # Visualization generator
│
└── examples/ # Examples and data
├── sample_data/ # Sample data
│ ├── sample_ab_test_data.csv
│ └── sample_user_segments.csv
├── basic_ab_test_example.py # Basic AB test example
├── advanced_segmentation_example.py # Advanced segmentation analysis example
└── comprehensive_analysis_example.py # Comprehensive analysis example💡 主要功能
💡 Key Features
1. AB测试基础分析
1. Basic AB Testing Analysis
转化率分析
Conversion Rate Analysis
python
undefinedpython
undefined计算各组转化率
Calculate conversion rates for each group
conversion_rates = analyzer.calculate_conversion_rates(
data,
group_col='实验组别',
conversion_col='转化状态'
)
conversion_rates = analyzer.calculate_conversion_rates(
data,
group_col='Test Group',
conversion_col='Conversion Status'
)
计算提升率和置信区间
Calculate lift and confidence interval
lift_analysis = analyzer.calculate_lift(
conversion_rates,
control_group='对照组',
test_group='测试组'
)
undefinedlift_analysis = analyzer.calculate_lift(
conversion_rates,
control_group='Control Group',
test_group='Test Group'
)
undefined留存率分析
Retention Rate Analysis
python
undefinedpython
undefined计算留存率
Calculate retention rates
retention_rates = analyzer.calculate_retention_rates(
data,
group_col='实验组别',
retention_col='retention_7'
)
retention_rates = analyzer.calculate_retention_rates(
data,
group_col='Test Group',
retention_col='retention_7'
)
留存率曲线可视化
Visualize retention rate curves
fig = visualizer.plot_retention_curves(retention_rates)
undefinedfig = visualizer.plot_retention_curves(retention_rates)
undefined2. 统计显著性检验
2. Statistical Significance Testing
t检验
t-test
python
undefinedpython
undefined独立样本t检验
Independent samples t-test
t_result = stats_tests.t_test(
data,
group_col='页面版本',
metric_col='购买金额',
test_type='independent'
)
t_result = stats_tests.t_test(
data,
group_col='Page Version',
metric_col='Purchase Amount',
test_type='independent'
)
配对样本t检验
Paired samples t-test
paired_t_result = stats_tests.t_test(
before_after_data,
group_col='用户ID',
metric_col='行为指标',
test_type='paired'
)
undefinedpaired_t_result = stats_tests.t_test(
before_after_data,
group_col='User ID',
metric_col='Behavioral Metric',
test_type='paired'
)
undefined卡方检验
Chi-square Test
python
undefinedpython
undefined拟合优度检验
Goodness of fit test
chi2_goodness = stats_tests.chi_square_test(
observed_data,
expected_data,
test_type='goodness_of_fit'
)
chi2_goodness = stats_tests.chi_square_test(
observed_data,
expected_data,
test_type='goodness_of_fit'
)
独立性检验
Independence test
chi2_independence = stats_tests.chi_square_test(
data,
group_col='实验组别',
outcome_col='转化状态',
test_type='independence'
)
undefinedchi2_independence = stats_tests.chi_square_test(
data,
group_col='Test Group',
outcome_col='Conversion Status',
test_type='independence'
)
undefined效应量计算
Effect Size Calculation
python
undefinedpython
undefinedCohen's d计算
Calculate Cohen's d
cohens_d = stats_tests.cohens_d(
data,
group_col='实验组别',
metric_col='转化状态'
)
cohens_d = stats_tests.cohens_d(
data,
group_col='Test Group',
metric_col='Conversion Status'
)
Cramer's V计算
Calculate Cramer's V
cramers_v = stats_tests.cramers_v(data, group_col, outcome_col)
undefinedcramers_v = stats_tests.cramers_v(data, group_col, outcome_col)
undefined3. 用户分群分析
3. User Segmentation Analysis
价值分群
Value-based Segmentation
python
from scripts.segment_analyzer import SegmentAnalyzer
segment_analyzer = SegmentAnalyzer()python
from scripts.segment_analyzer import SegmentAnalyzer
segment_analyzer = SegmentAnalyzer()基于价值的用户分群
Value-based user segmentation
value_segments = segment_analyzer.value_based_segmentation(
data,
value_col='累计消费金额',
n_tiers=3
)
value_segments = segment_analyzer.value_based_segmentation(
data,
value_col='Total Consumption Amount',
n_tiers=3
)
分群转化率分析
Segmentation conversion analysis
segment_conversion = segment_analyzer.segment_conversion_analysis(
data,
segment_col='价值组别',
group_col='实验组别',
conversion_col='转化状态'
)
undefinedsegment_conversion = segment_analyzer.segment_conversion_analysis(
data,
segment_col='Value Tier',
group_col='Test Group',
conversion_col='Conversion Status'
)
undefined交互效应分析
Interaction Effect Analysis
python
undefinedpython
undefined页面版本与用户特征的交互效应
Interaction effect between page version and user characteristics
interaction_analysis = segment_analyzer.interaction_analysis(
data,
group_col='页面版本',
segment_col='价值组别',
outcome_col='转化状态'
)
interaction_analysis = segment_analyzer.interaction_analysis(
data,
group_col='Page Version',
segment_col='Value Tier',
outcome_col='Conversion Status'
)
交互效应可视化
Visualize interaction effects
fig = visualizer.plot_interaction_effects(interaction_analysis)
undefinedfig = visualizer.plot_interaction_effects(interaction_analysis)
undefined4. 高级统计分析
4. Advanced Statistical Analysis
贝叶斯AB测试
Bayesian AB Testing
python
undefinedpython
undefined贝叶斯AB测试分析
Bayesian AB test analysis
bayesian_result = analyzer.bayesian_ab_test(
data,
group_col='实验组别',
conversion_col='转化状态',
prior='jeffreys'
)
bayesian_result = analyzer.bayesian_ab_test(
data,
group_col='Test Group',
conversion_col='Conversion Status',
prior='jeffreys'
)
计算获胜概率
Calculate win probability
win_probability = analyzer.calculate_win_probability(bayesian_result)
undefinedwin_probability = analyzer.calculate_win_probability(bayesian_result)
undefined多变量检验
Multivariate Testing
python
undefinedpython
undefined多变量AB测试 (MVT)
Multivariate AB testing (MVT)
mvt_result = analyzer.multivariate_test(
data,
group_cols=['页面版本', '按钮颜色', '标题文案'],
conversion_col='转化状态'
)
undefinedmvt_result = analyzer.multivariate_test(
data,
group_cols=['Page Version', 'Button Color', 'Title Copy'],
conversion_col='Conversion Status'
)
undefined5. 可视化报告
5. Visualization Reports
基础图表
Basic Charts
python
undefinedpython
undefined转化率对比图
Conversion rate comparison chart
fig = visualizer.plot_conversion_comparison(
conversion_data,
title='AB测试转化率对比'
)
fig = visualizer.plot_conversion_comparison(
conversion_data,
title='AB Test Conversion Rate Comparison'
)
置信区间图
Confidence interval chart
fig = visualizer.plot_confidence_intervals(
statistical_results,
metric='转化率'
)
fig = visualizer.plot_confidence_intervals(
statistical_results,
metric='Conversion Rate'
)
用户分群热力图
User segmentation heatmap
fig = visualizer.plot_segment_heatmap(
segment_data,
title='用户分群转化率热力图'
)
undefinedfig = visualizer.plot_segment_heatmap(
segment_data,
title='User Segmentation Conversion Rate Heatmap'
)
undefined交互式仪表板
Interactive Dashboard
python
undefinedpython
undefined生成交互式仪表板
Generate interactive dashboard
dashboard = visualizer.create_interactive_dashboard(
analysis_results,
output_file='ab_test_dashboard.html'
)
undefineddashboard = visualizer.create_interactive_dashboard(
analysis_results,
output_file='ab_test_dashboard.html'
)
undefined📊 数据格式
📊 Data Format
AB测试数据格式 (ab_test_data.csv)
AB Testing Data Format (ab_test_data.csv)
csv
用户ID,实验组别,转化状态,留存状态,累计消费金额,性别,年龄,价值组别,设备类型
U001,测试组,是,TRUE,299.99,男,25,高价值,移动端
U002,对照组,否,FALSE,59.99,女,32,低价值,PC端
U003,测试组,是,TRUE,599.99,男,28,高价值,移动端
U004,对照组,否,FALSE,199.99,女,35,中价值,PC端csv
User ID,Test Group,Conversion Status,Retention Status,Total Consumption Amount,Gender,Age,Value Tier,Device Type
U001,Test Group,Yes,TRUE,299.99,Male,25,High Value,Mobile
U002,Control Group,No,FALSE,59.99,Female,32,Low Value,PC
U003,Test Group,Yes,TRUE,599.99,Male,28,High Value,Mobile
U004,Control Group,No,FALSE,199.99,Female,35,Medium Value,PC用户分群数据格式 (user_segments.csv)
User Segmentation Data Format (user_segments.csv)
csv
用户ID,RFM分群,行为分群,人口统计分群,综合分群
U001,高价值,活跃用户,年轻男性,高价值年轻用户
U002,低价值,流失用户,成熟女性,需要唤醒用户csv
User ID,RFM Segment,Behavioral Segment,Demographic Segment,Comprehensive Segment
U001,High Value,Active User,Young Male,High Value Young User
U002,Low Value,Churned User,Mature Female,Need Re-engagement User🎯 应用场景
🎯 Application Scenarios
产品优化
Product Optimization
- 网页改版效果评估
- 功能上线影响分析
- 用户界面优化测试
- 性能改进验证
- Website revision effect evaluation
- Function launch impact analysis
- UI optimization testing
- Performance improvement verification
营销活动
Marketing Campaigns
- 广告创意测试
- 促销策略评估
- 邮件营销优化
- 社交媒体活动分析
- Ad creative testing
- Promotion strategy evaluation
- Email marketing optimization
- Social media campaign analysis
运营策略
Operational Strategies
- 定价策略测试
- 推荐算法优化
- 用户注册流程改进
- 客户服务策略评估
- Pricing strategy testing
- Recommendation algorithm optimization
- User registration process improvement
- Customer service strategy evaluation
⚙️ 高级配置
⚙️ Advanced Configuration
统计参数设置
Statistical Parameter Settings
python
undefinedpython
undefined设置显著性水平
Set significance level
analyzer.set_significance_level(alpha=0.05)
analyzer.set_significance_level(alpha=0.05)
设置统计功效
Set statistical power
analyzer.set_statistical_power(power=0.8)
analyzer.set_statistical_power(power=0.8)
设置多重比较校正方法
Set multiple comparison correction method
analyzer.set_multiple_comparison_correction(method='bonferroni')
undefinedanalyzer.set_multiple_comparison_correction(method='bonferroni')
undefined自定义分群策略
Custom Segmentation Strategies
python
undefinedpython
undefined定义自定义分群规则
Define custom segmentation rules
custom_segments = {
'high_value': {'累计消费金额': (500, float('inf'))},
'medium_value': {'累计消费金额': (100, 500)},
'low_value': {'累计消费金额': (0, 100)}
}
custom_segments = {
'high_value': {'Total Consumption Amount': (500, float('inf'))},
'medium_value': {'Total Consumption Amount': (100, 500)},
'low_value': {'Total Consumption Amount': (0, 100)}
}
应用自定义分群
Apply custom segmentation
segmented_data = segment_analyzer.apply_custom_segments(
data,
segment_rules=custom_segments
)
undefinedsegmented_data = segment_analyzer.apply_custom_segments(
data,
segment_rules=custom_segments
)
undefined高级可视化配置
Advanced Visualization Configuration
python
undefinedpython
undefined设置图表风格
Set chart style
visualizer.set_style(style='seaborn', palette='viridis')
visualizer.set_style(style='seaborn', palette='viridis')
自定义图表配置
Custom chart configuration
chart_config = {
'figsize': (12, 8),
'dpi': 300,
'format': 'png',
'style': 'professional'
}
fig = visualizer.plot_with_config(data, config=chart_config)
undefinedchart_config = {
'figsize': (12, 8),
'dpi': 300,
'format': 'png',
'style': 'professional'
}
fig = visualizer.plot_with_config(data, config=chart_config)
undefined🐛 常见问题
🐛 Frequently Asked Questions
Q: 如何确定合适的样本量?
Q: How to determine the appropriate sample size?
A: 使用样本量计算功能,考虑效应量、显著性水平和统计功效来计算最小样本量。
A: Use the sample size calculation function, considering effect size, significance level, and statistical power to calculate the minimum sample size.
Q: p值小于0.05是否意味着结果显著?
Q: Does a p-value less than 0.05 mean the result is significant?
A: p值小于0.05表示在原假设为真的情况下,观察到当前结果或更极端结果的概率小于5%。需要结合效应量和实际意义来解释。
A: A p-value less than 0.05 indicates that the probability of observing the current result or a more extreme one is less than 5% assuming the null hypothesis is true. You need to combine effect size and practical significance to interpret the result.
Q: 如何处理多重比较问题?
Q: How to handle multiple comparison issues?
A: 使用Bonferroni校正、FDR校正等方法来调整p值,避免假阳性。
A: Use methods like Bonferroni correction or FDR correction to adjust p-values and avoid false positives.
Q: 何时使用贝叶斯AB测试?
Q: When to use Bayesian AB testing?
A: 当需要先验信息、样本量较小或想要获得概率性结论时,考虑使用贝叶斯方法。
A: Consider using Bayesian methods when you need prior information, have a small sample size, or want probabilistic conclusions.
📈 性能优化
📈 Performance Optimization
- 使用向量化操作加速计算
- 实现增量统计更新
- 采用并行计算处理大数据
- 缓存计算结果避免重复计算
- 优化内存使用模式
- Use vectorized operations to accelerate calculations
- Implement incremental statistical updates
- Adopt parallel computing for big data processing
- Cache calculation results to avoid redundant computations
- Optimize memory usage patterns
📚 技术原理
📚 Technical Principles
统计检验基础
Basics of Statistical Testing
基于假设检验理论,通过计算检验统计量和p值来判断实验结果的统计显著性。
Based on hypothesis testing theory, judge the statistical significance of experimental results by calculating test statistics and p-values.
中心极限定理
Central Limit Theorem
在大样本情况下,样本均值的分布趋近于正态分布,为许多统计方法提供理论基础。
In large samples, the distribution of sample means approaches a normal distribution, providing a theoretical basis for many statistical methods.
贝叶斯推断
Bayesian Inference
结合先验信息和观测数据,通过后验分布进行参数估计和假设检验。
Combine prior information and observed data to perform parameter estimation and hypothesis testing through posterior distributions.
多重比较校正
Multiple Comparison Correction
当同时进行多个假设检验时,控制总体错误率,避免假阳性结果的增加。
When conducting multiple hypothesis tests simultaneously, control the overall error rate to avoid an increase in false positive results.
🤝 贡献指南
🤝 Contribution Guidelines
欢迎提交Issue和Pull Request来改进这个技能。
Welcome to submit Issues and Pull Requests to improve this skill.
📄 许可证
📄 License
MIT License
MIT License
🎉 开始使用
🎉 Get Started
现在你已经了解了AB测试分析技能的所有功能,可以开始使用了:
bash
undefinedNow that you have learned all the features of the AB Testing Analyzer skill, you can start using it:
bash
undefined快速验证功能
Quick function verification
python quick_test.py
python quick_test.py
运行示例
Run examples
python examples/basic_ab_test_example.py
享受你的AB测试分析之旅!🚀python examples/basic_ab_test_example.py
Enjoy your AB testing analysis journey! 🚀