mkt-ab-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Marketing A/B Testing

营销A/B测试

Framework

框架

IRON LAW: One Variable at a Time

If you change the headline AND the image AND the CTA simultaneously,
you cannot know which change caused the result. Test ONE variable per
experiment. If you need to test multiple changes, use sequential tests
or multivariate testing (MVT) with sufficient traffic.

IRON LAW: One Variable at a Time

If you change the headline AND the image AND the CTA simultaneously,
you cannot know which change caused the result. Test ONE variable per
experiment. If you need to test multiple changes, use sequential tests
or multivariate testing (MVT) with sufficient traffic.

What to Test (by Impact)

测试内容（按影响程度排序）

Element	Expected Lift	Traffic Needed	Priority
Offer/Pricing	10-50%	Medium	Highest
Headline/Subject line	5-30%	Low	High
CTA (text, color, placement)	5-20%	Low	High
Page layout	5-15%	Medium	Medium
Image/Video	3-15%	Medium	Medium
Form fields	5-25% (reduction = higher CVR)	Low	Medium
Social proof placement	3-10%	Medium	Lower

元素	预期提升幅度	所需流量	优先级
报价/定价	10-50%	中等	最高
标题/邮件主题	5-30%	低	高
CTA（文本、颜色、位置）	5-20%	低	高
页面布局	5-15%	中等	中
图片/视频	3-15%	中等	中
表单字段	5-25%（字段减少=转化率提升）	低	中
社交证明摆放位置	3-10%	中等	较低

Test Design

测试设计

Hypothesis: "Changing [variable] from [A] to [B] will increase [metric] by [X%] because [reasoning]"
Primary metric: ONE metric that determines winner (conversion rate, revenue per visitor, signup rate)
Guardrail metrics: Metrics that must NOT degrade (bounce rate, page load time, revenue per user)
Traffic split: 50/50 between control and variant (standard)
Sample size: Calculate before starting (see stat-ab-testing for formula)
Duration: Minimum 1-2 full business weeks (capture day-of-week effects)

假设: "将[变量]从[A]改为[B]将使[指标]提升[X%]，因为[推理依据]"
核心指标: 唯一决定测试胜负的指标（转化率、访客贡献收入、注册率）
约束指标: 不能出现下滑的指标（跳出率、页面加载时间、单用户收入）
流量分配: 对照组与变体组各占50%（标准配置）
样本量: 测试开始前计算（参考stat-ab-testing技能中的公式）
测试时长: 至少1-2个完整业务周（覆盖周内不同日期的行为差异）

Common Marketing Tests

常见营销测试场景

Test	Control (A)	Variant (B)	Metric
Email subject	"Your weekly update"	"3 trends you missed this week"	Open rate
Landing page CTA	"Sign Up"	"Start Free Trial"	Click rate
Pricing page	Show 3 plans	Show 2 plans + "most popular" badge	Conversion rate
Ad creative	Product photo	Lifestyle photo with product	CTR → conversion
Form length	8 fields	4 fields	Form completion rate

测试场景	对照组（A）	变体组（B）	核心指标
邮件主题	"您的每周更新"	"本周您错过的3大趋势"	打开率
着陆页CTA	"立即注册"	"开始免费试用"	点击率
定价页	展示3种套餐	展示2种套餐 + "最受欢迎"标识	转化率
广告创意	产品实拍图	包含产品的生活场景图	点击率 → 转化率
表单长度	8个字段	4个字段	表单完成率

Analysis & Decision

结果分析与决策

Result	Decision	Action
B wins, p < 0.05, meaningful lift	Ship B	Deploy variant, start next test
B wins, p < 0.05, tiny lift (<1%)	Don't ship	Lift not worth the change risk
No significant difference	Keep A	A is the known quantity; test something else
B wins on primary but loses on guardrail	Investigate	May need to redesign variant

测试结果	决策	行动
B组获胜，p < 0.05，提升幅度显著	上线B版本	部署变体，启动下一项测试
B组获胜，p < 0.05，提升幅度极小(<1%)	不上线B版本	提升幅度不足以抵消变更风险
无显著差异	保留A版本	A版本为已知稳定方案，测试其他内容
B组核心指标获胜，但约束指标下滑	深入调查	可能需要重新设计变体

Output Format

输出格式

markdown

undefined

markdown

undefined

A/B Test Plan: {Test Name}

Hypothesis

Changing {variable} from {A} to {B} will increase {metric} by {X%} because {reasoning}.

Design

Primary metric: {metric}
Guardrail: {metric(s)}
Split: 50/50
Sample size: {N per variant}
Duration: {days/weeks}

Primary metric: {metric}
Guardrail: {metric(s)}
Split: 50/50
Sample size: {N per variant}
Duration: {days/weeks}

Results

Metric	Control	Variant	Diff	CI (95%)	Significant?
{primary}	{value}	{value}	{±%}	[{lower}, {upper}]	Y/N

Metric	Control	Variant	Diff	CI (95%)	Significant?
{primary}	{value}	{value}	{±%}	[{lower}, {upper}]	Y/N

Decision

{Ship / Don't ship / Extend} — {rationale}

undefined

{Ship / Don't ship / Extend} — {rationale}

undefined

Gotchas

注意事项

Don't stop early because it "looks good": Peeking at results and stopping when you see significance inflates false positive rates to 30%+. Run to planned sample size.
Day-of-week effects: Monday visitors behave differently from Saturday visitors. Always run tests for at least 1-2 complete weeks.
Novelty effect: A new design may get a temporary lift from curiosity. Wait 2+ weeks to see if the effect sustains.
Winner's curse: The estimated lift from a test is often larger than the true lift due to statistical noise. Expect the actual impact after deployment to be smaller.
Don't test everything — test what matters: Running 20 small tests on button colors while ignoring the pricing page is misallocating effort. Test high-impact elements first.

不要因"初步结果向好"提前终止测试: 中途查看结果并在发现显著性时停止测试，会将假阳性率提升至30%以上。需运行至计划样本量。
周内日期影响: 周一访客行为与周六访客差异显著。测试时长至少覆盖1-2个完整业务周。
新奇效应: 新设计可能因好奇心获得短期提升。需等待2周以上确认效果是否持续。
胜者魔咒: 测试得出的预期提升幅度常因统计误差高于实际效果。部署后实际影响可能更小。
不要盲目测试所有内容——聚焦高价值项: 花费精力测试20次按钮颜色，却忽略定价页优化，属于资源错配。优先测试高影响元素。

Scripts

脚本工具

Script	Description	Usage
`scripts/ab_test.py`	Two-proportion z-test with effect size and sample-size planning	`python scripts/ab_test.py --help`

Run

python scripts/ab_test.py --verify

to execute built-in sanity tests.

脚本	描述	使用方式
`scripts/ab_test.py`	包含效应量计算与样本量规划的双比例Z检验工具	`python scripts/ab_test.py --help`

运行

python scripts/ab_test.py --verify

执行内置的 sanity 测试。

References

参考资料

For statistical methodology (sample size, p-values), see the stat-ab-testing skill
For multivariate testing design, see
```
references/mvt-design.md
```

统计方法（样本量、p值）参考 stat-ab-testing 技能
多变量测试设计参考
```
references/mvt-design.md
```