mkt-ab-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMarketing A/B Testing
营销A/B测试
Framework
框架
IRON LAW: One Variable at a Time
If you change the headline AND the image AND the CTA simultaneously,
you cannot know which change caused the result. Test ONE variable per
experiment. If you need to test multiple changes, use sequential tests
or multivariate testing (MVT) with sufficient traffic.IRON LAW: One Variable at a Time
If you change the headline AND the image AND the CTA simultaneously,
you cannot know which change caused the result. Test ONE variable per
experiment. If you need to test multiple changes, use sequential tests
or multivariate testing (MVT) with sufficient traffic.What to Test (by Impact)
测试内容(按影响程度排序)
| Element | Expected Lift | Traffic Needed | Priority |
|---|---|---|---|
| Offer/Pricing | 10-50% | Medium | Highest |
| Headline/Subject line | 5-30% | Low | High |
| CTA (text, color, placement) | 5-20% | Low | High |
| Page layout | 5-15% | Medium | Medium |
| Image/Video | 3-15% | Medium | Medium |
| Form fields | 5-25% (reduction = higher CVR) | Low | Medium |
| Social proof placement | 3-10% | Medium | Lower |
| 元素 | 预期提升幅度 | 所需流量 | 优先级 |
|---|---|---|---|
| 报价/定价 | 10-50% | 中等 | 最高 |
| 标题/邮件主题 | 5-30% | 低 | 高 |
| CTA(文本、颜色、位置) | 5-20% | 低 | 高 |
| 页面布局 | 5-15% | 中等 | 中 |
| 图片/视频 | 3-15% | 中等 | 中 |
| 表单字段 | 5-25%(字段减少=转化率提升) | 低 | 中 |
| 社交证明摆放位置 | 3-10% | 中等 | 较低 |
Test Design
测试设计
- Hypothesis: "Changing [variable] from [A] to [B] will increase [metric] by [X%] because [reasoning]"
- Primary metric: ONE metric that determines winner (conversion rate, revenue per visitor, signup rate)
- Guardrail metrics: Metrics that must NOT degrade (bounce rate, page load time, revenue per user)
- Traffic split: 50/50 between control and variant (standard)
- Sample size: Calculate before starting (see stat-ab-testing for formula)
- Duration: Minimum 1-2 full business weeks (capture day-of-week effects)
- 假设: "将[变量]从[A]改为[B]将使[指标]提升[X%],因为[推理依据]"
- 核心指标: 唯一决定测试胜负的指标(转化率、访客贡献收入、注册率)
- 约束指标: 不能出现下滑的指标(跳出率、页面加载时间、单用户收入)
- 流量分配: 对照组与变体组各占50%(标准配置)
- 样本量: 测试开始前计算(参考stat-ab-testing技能中的公式)
- 测试时长: 至少1-2个完整业务周(覆盖周内不同日期的行为差异)
Common Marketing Tests
常见营销测试场景
| Test | Control (A) | Variant (B) | Metric |
|---|---|---|---|
| Email subject | "Your weekly update" | "3 trends you missed this week" | Open rate |
| Landing page CTA | "Sign Up" | "Start Free Trial" | Click rate |
| Pricing page | Show 3 plans | Show 2 plans + "most popular" badge | Conversion rate |
| Ad creative | Product photo | Lifestyle photo with product | CTR → conversion |
| Form length | 8 fields | 4 fields | Form completion rate |
| 测试场景 | 对照组(A) | 变体组(B) | 核心指标 |
|---|---|---|---|
| 邮件主题 | "您的每周更新" | "本周您错过的3大趋势" | 打开率 |
| 着陆页CTA | "立即注册" | "开始免费试用" | 点击率 |
| 定价页 | 展示3种套餐 | 展示2种套餐 + "最受欢迎"标识 | 转化率 |
| 广告创意 | 产品实拍图 | 包含产品的生活场景图 | 点击率 → 转化率 |
| 表单长度 | 8个字段 | 4个字段 | 表单完成率 |
Analysis & Decision
结果分析与决策
| Result | Decision | Action |
|---|---|---|
| B wins, p < 0.05, meaningful lift | Ship B | Deploy variant, start next test |
| B wins, p < 0.05, tiny lift (<1%) | Don't ship | Lift not worth the change risk |
| No significant difference | Keep A | A is the known quantity; test something else |
| B wins on primary but loses on guardrail | Investigate | May need to redesign variant |
| 测试结果 | 决策 | 行动 |
|---|---|---|
| B组获胜,p < 0.05,提升幅度显著 | 上线B版本 | 部署变体,启动下一项测试 |
| B组获胜,p < 0.05,提升幅度极小(<1%) | 不上线B版本 | 提升幅度不足以抵消变更风险 |
| 无显著差异 | 保留A版本 | A版本为已知稳定方案,测试其他内容 |
| B组核心指标获胜,但约束指标下滑 | 深入调查 | 可能需要重新设计变体 |
Output Format
输出格式
markdown
undefinedmarkdown
undefinedA/B Test Plan: {Test Name}
A/B Test Plan: {Test Name}
Hypothesis
Hypothesis
Changing {variable} from {A} to {B} will increase {metric} by {X%} because {reasoning}.
Changing {variable} from {A} to {B} will increase {metric} by {X%} because {reasoning}.
Design
Design
- Primary metric: {metric}
- Guardrail: {metric(s)}
- Split: 50/50
- Sample size: {N per variant}
- Duration: {days/weeks}
- Primary metric: {metric}
- Guardrail: {metric(s)}
- Split: 50/50
- Sample size: {N per variant}
- Duration: {days/weeks}
Results
Results
| Metric | Control | Variant | Diff | CI (95%) | Significant? |
|---|---|---|---|---|---|
| {primary} | {value} | {value} | {±%} | [{lower}, {upper}] | Y/N |
| Metric | Control | Variant | Diff | CI (95%) | Significant? |
|---|---|---|---|---|---|
| {primary} | {value} | {value} | {±%} | [{lower}, {upper}] | Y/N |
Decision
Decision
{Ship / Don't ship / Extend} — {rationale}
undefined{Ship / Don't ship / Extend} — {rationale}
undefinedGotchas
注意事项
- Don't stop early because it "looks good": Peeking at results and stopping when you see significance inflates false positive rates to 30%+. Run to planned sample size.
- Day-of-week effects: Monday visitors behave differently from Saturday visitors. Always run tests for at least 1-2 complete weeks.
- Novelty effect: A new design may get a temporary lift from curiosity. Wait 2+ weeks to see if the effect sustains.
- Winner's curse: The estimated lift from a test is often larger than the true lift due to statistical noise. Expect the actual impact after deployment to be smaller.
- Don't test everything — test what matters: Running 20 small tests on button colors while ignoring the pricing page is misallocating effort. Test high-impact elements first.
- 不要因"初步结果向好"提前终止测试: 中途查看结果并在发现显著性时停止测试,会将假阳性率提升至30%以上。需运行至计划样本量。
- 周内日期影响: 周一访客行为与周六访客差异显著。测试时长至少覆盖1-2个完整业务周。
- 新奇效应: 新设计可能因好奇心获得短期提升。需等待2周以上确认效果是否持续。
- 胜者魔咒: 测试得出的预期提升幅度常因统计误差高于实际效果。部署后实际影响可能更小。
- 不要盲目测试所有内容——聚焦高价值项: 花费精力测试20次按钮颜色,却忽略定价页优化,属于资源错配。优先测试高影响元素。
Scripts
脚本工具
| Script | Description | Usage |
|---|---|---|
| Two-proportion z-test with effect size and sample-size planning | |
Run to execute built-in sanity tests.
python scripts/ab_test.py --verify| 脚本 | 描述 | 使用方式 |
|---|---|---|
| 包含效应量计算与样本量规划的双比例Z检验工具 | |
运行 执行内置的 sanity 测试。
python scripts/ab_test.py --verifyReferences
参考资料
- For statistical methodology (sample size, p-values), see the stat-ab-testing skill
- For multivariate testing design, see
references/mvt-design.md
- 统计方法(样本量、p值)参考 stat-ab-testing 技能
- 多变量测试设计参考
references/mvt-design.md