product-analytics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Product Analytics

产品分析

Frameworks for turning raw product data into ship/extend/kill decisions. Covers A/B testing, cohort retention, funnel analysis, and the statistical foundations needed to make those decisions with confidence.
将原始产品数据转化为发布/扩展/下线决策的框架。涵盖A/B测试、群组留存、漏斗分析,以及做出这些决策所需的统计基础,确保决策的可信度。

Quick Reference

快速参考

CategoryRulesImpactWhen to Use
A/B Test Evaluation1HIGHComparing variants, measuring significance, shipping decisions
Cohort Retention1HIGHFeature adoption curves, day-N retention, engagement scoring
Funnel Analysis1HIGHDrop-off diagnosis, conversion optimization, stage mapping
Statistical Foundations1HIGHp-value interpretation, sample sizing, confidence intervals
Total: 4 rules across 4 categories
类别规则数影响程度适用场景
A/B测试评估1对比变体、衡量显著性、发布决策
群组留存1功能采用曲线、N日留存、参与度评分
漏斗分析1流失诊断、转化优化、阶段映射
统计基础1p值解读、样本量确定、置信区间
总计:4个类别下的4条规则

A/B Test Evaluation

A/B测试评估

Load
rules/ab-test-evaluation.md
for the full framework. Quick pattern:
markdown
undefined
加载
rules/ab-test-evaluation.md
获取完整框架。快速模板:
markdown
undefined

Experiment: [Name]

实验: [名称]

Hypothesis: If we [change], then [primary metric] will [direction] by [amount] because [evidence or reasoning].
Sample size: [N per variant] — calculated for MDE=[X%], power=80%, alpha=0.05 Duration: [Minimum weeks] — never stop early (peeking bias)
Results: Control: [metric value] n=[count] Treatment: [metric value] n=[count] Lift: [+/- X%] p=[value] 95% CI: [lower, upper]
Decision: SHIP / EXTEND / KILL Rationale: [One sentence grounded in numbers, not gut feel]

**Decision rules:**
- **SHIP** — p < 0.05, CI excludes zero, no guardrail regressions
- **EXTEND** — trending positive but underpowered (add runtime, not reanalysis)
- **KILL** — null result or guardrail degradation

See `rules/ab-test-evaluation.md` for sample size formulas, SRM checks, and pitfall list.
假设:如果我们[做出变更],那么[核心指标]将[变化方向][变化幅度], 原因是[证据或推理]。
样本量:[每个变体的样本数N] — 基于最小可检测效果(MDE)=[X%]、统计功效=80%、显著性水平α=0.05计算得出 持续时长:[最少周数] — 绝不能提前停止(避免偷看偏差)
结果: 对照组: [指标数值] n=[样本数] 实验组: [指标数值] n=[样本数] 提升幅度: [+/- X%] p=[数值] 95% 置信区间: [下限, 上限]
决策: 发布(SHIP)/ 扩展(EXTEND)/ 下线(KILL) 理由: [基于数据的一句话总结,而非主观判断]

**决策规则:**
- **发布(SHIP)** — p < 0.05、置信区间不包含0、无护栏指标退化
- **扩展(EXTEND)** — 呈正向趋势但统计功效不足(延长实验时长,而非重新分析)
- **下线(KILL)** — 无显著结果或护栏指标退化

查看 `rules/ab-test-evaluation.md` 获取样本量计算公式、样本比例偏差(SRM)检查方法及常见陷阱列表。

Cohort Retention

群组留存

Load
rules/cohort-retention.md
for full methodology. Quick pattern:
sql
-- Day-N retention cohort query
SELECT
  DATE_TRUNC('week', first_seen)  AS cohort_week,
  COUNT(DISTINCT user_id)         AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN activity_date = first_seen + INTERVAL '7 days'
    THEN user_id END) * 100.0
    / COUNT(DISTINCT user_id)     AS day_7_retention
FROM user_activity
GROUP BY 1
ORDER BY 1;
Retention benchmarks (SaaS):
  • Day 1: 40–60% is healthy
  • Day 7: 20–35% is healthy
  • Day 30: 10–20% is healthy
  • Flat curve after day 30 = product-market fit signal
See
rules/cohort-retention.md
for behavior-based cohorts, feature adoption curves, and engagement scoring.
加载
rules/cohort-retention.md
获取完整方法。快速查询模板:
sql
-- N日留存群组查询
SELECT
  DATE_TRUNC('week', first_seen)  AS cohort_week,
  COUNT(DISTINCT user_id)         AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN activity_date = first_seen + INTERVAL '7 days'
    THEN user_id END) * 100.0
    / COUNT(DISTINCT user_id)     AS day_7_retention
FROM user_activity
GROUP BY 1
ORDER BY 1;
SaaS产品留存基准:
  • 次日留存: 40–60% 为健康水平
  • 7日留存: 20–35% 为健康水平
  • 30日留存: 10–20% 为健康水平
  • 30日后留存曲线趋于平稳 = 产品市场契合信号
查看
rules/cohort-retention.md
获取基于行为的群组划分、功能采用曲线及参与度评分方法。

Funnel Analysis

漏斗分析

Load
rules/funnel-analysis.md
for full methodology. Quick pattern:
markdown
undefined
加载
rules/funnel-analysis.md
获取完整方法。快速模板:
markdown
undefined

Funnel: [Name] — [Date Range]

漏斗: [名称] — [时间范围]

Stage 1: [Aware / Land] → [N] users (entry) Stage 2: [Activate / Sign] → [N] users ([X]% from stage 1) Stage 3: [Engage / Use] → [N] users ([X]% from stage 2) ← biggest drop Stage 4: [Convert / Pay] → [N] users ([X]% from stage 3)
Overall conversion: [X]% Biggest drop-off: Stage 2→3 ([X]% loss) — investigate first

**Optimization order:** Fix the largest drop-off first. A 5-point improvement at a high-volume step is worth more than a 20-point improvement at a low-volume step.

See `rules/funnel-analysis.md` for segmented funnels, micro-conversion tracking, and prioritization patterns.
阶段1: [认知/着陆] → [N] 位用户 (入口) 阶段2: [激活/注册] → [N] 位用户 (较阶段1留存[X]%) 阶段3: [参与/使用] → [N] 位用户 (较阶段2留存[X]%) ← 流失最严重的环节 阶段4: [转化/付费] → [N] 位用户 (较阶段3留存[X]%)
整体转化率: [X]% 最大流失点: 阶段2→3(流失[X]%) — 优先排查

**优化顺序:** 优先修复最大的流失环节。高流量环节提升5个百分点,比低流量环节提升20个百分点的价值更高。

查看 `rules/funnel-analysis.md` 获取细分漏斗、微转化追踪及优先级排序方法。

Statistical Foundations

统计基础

Plain-English explanations of the stats every PM needs. Load
references/stats-cheat-sheet.md
for formulas and quick lookups.
p-value in plain English: The probability that you would see a result this extreme (or more extreme) if the change had zero effect. p=0.03 means a 3% chance you're looking at random noise. It does NOT mean "97% probability the change works."
Confidence interval in plain English: The range where the true effect probably lives. "Lift = +8%, 95% CI [+2%, +14%]" means you are fairly confident the real lift is somewhere between 2% and 14%. If the CI includes zero, you cannot claim a win.
Minimum Detectable Effect (MDE): The smallest lift you care about detecting. Setting MDE too small forces impractically large sample sizes. Anchor MDE to business value — if a 2% lift is not worth shipping, set MDE = 5%.
Statistical vs practical significance: A result can be statistically significant (p < 0.05) but practically meaningless (lift = 0.01%). Always check both. A 0.01% lift that costs 6 weeks of eng time is not a win.
为产品经理准备的统计知识通俗讲解。加载
references/stats-cheat-sheet.md
获取公式及快速查询指南。
p值通俗解释: 如果变更没有任何效果,出现当前(或更极端)结果的概率。p=0.03意味着有3%的概率你看到的只是随机噪声。它不代表“变更有效的概率为97%”。
置信区间通俗解释: 真实效果大概率所在的范围。“提升幅度 = +8%, 95% 置信区间 [+2%, +14%]”意味着你有足够的信心认为真实提升幅度在2%到14%之间。如果置信区间包含0,则无法宣称实验成功。
最小可检测效果(MDE): 你关心的最小提升幅度。MDE设置过小会导致样本量需求过大,不切实际。应结合业务价值设定MDE — 如果2%的提升不值得发布,就将MDE设为5%。
统计显著性 vs 实际显著性: 结果可能具备统计显著性(p < 0.05)但毫无实际意义(提升幅度=0.01%)。务必同时检查两者。如果0.01%的提升需要花费6周的研发时间,这算不上成功。

Common Pitfalls

常见陷阱

  1. Peeking — stopping an experiment early because results look good inflates false-positive rate. Commit to a runtime before launch.
  2. Multiple comparisons — testing 10 metrics at p < 0.05 means ~1 false positive by chance. Apply Bonferroni correction or pre-register your primary metric.
  3. Sample Ratio Mismatch (SRM) — if variant group sizes differ from expected split by > 1%, your experiment is broken. Fix before analyzing results.
  4. Novelty effect — new features get inflated engagement in week 1. Run experiments long enough to see settled behavior (minimum 2 full business cycles).
  5. Simpson's paradox — aggregate results can reverse when segmented. Always check results by key segments (device, plan tier, geography).
  1. 偷看偏差 — 因为结果看起来不错就提前停止实验,会增加假阳性率。实验启动前就确定好持续时长。
  2. 多重比较 — 以p < 0.05为标准测试10个指标,约有1个假阳性结果是随机产生的。需采用邦费罗尼校正或预先注册核心指标。
  3. 样本比例偏差(SRM) — 如果变体组的样本量与预期分配比例差异超过1%,说明实验存在问题。分析结果前先修复问题。
  4. 新奇效应 — 新功能在第一周的参与度会被高估。实验时长需足够长,以观察稳定后的行为(至少2个完整业务周期)。
  5. 辛普森悖论 — 整体结果在细分后可能反转。务必按关键维度(设备、套餐层级、地域)细分检查结果。

Ship / Extend / Kill Framework

发布/扩展/下线框架

SignalDecisionAction
p < 0.05, CI excludes zero, guardrails greenSHIPFull rollout, update success metrics
Positive trend, underpowered (p = 0.10–0.15)EXTENDAdd runtime, do not peek again
p > 0.15, flat or negativeKILLRevert, document learnings, re-hypothesize
Guardrail regression, any p-valueKILLImmediate revert regardless of primary metric
SRM detectedINVALIDFix assignment bug, restart experiment
信号决策行动
p < 0.05、置信区间不包含0、护栏指标正常发布(SHIP)全面推出,更新成功指标
正向趋势、统计功效不足(p = 0.10–0.15)扩展(EXTEND)延长实验时长,不得再次偷看
p > 0.15、结果平稳或负向下线(KILL)回滚,记录经验教训,重新提出假设
护栏指标退化,无论p值如何下线(KILL)立即回滚,无论核心指标结果如何
检测到SRM无效(INVALID)修复分组bug,重启实验

Related Skills

相关技能

  • ork:product-frameworks
    — OKRs, KPI trees, RICE prioritization, PRD templates
  • ork:metrics-instrumentation
    — Event naming, metric definition, alerting setup
  • ork:brainstorm
    — Generate hypotheses and experiment ideas
  • ork:assess
    — Evaluate product quality and risks
  • ork:product-frameworks
    — OKRs、KPI树、RICE优先级排序、PRD模板
  • ork:metrics-instrumentation
    — 事件命名、指标定义、告警设置
  • ork:brainstorm
    — 生成假设和实验想法
  • ork:assess
    — 评估产品质量与风险

References

参考资料

  • rules/ab-test-evaluation.md
    — Hypothesis, sample size, significance, decision matrix
  • rules/cohort-retention.md
    — Cohort types, retention curves, SQL patterns
  • rules/funnel-analysis.md
    — Stage mapping, drop-off identification, optimization
  • references/stats-cheat-sheet.md
    — Formulas, test selection, power analysis

Version: 1.0.0 (March 2026)
  • rules/ab-test-evaluation.md
    — 假设、样本量、显著性、决策矩阵
  • rules/cohort-retention.md
    — 群组类型、留存曲线、SQL模板
  • rules/funnel-analysis.md
    — 阶段映射、流失定位、优化方法
  • references/stats-cheat-sheet.md
    — 公式、测试选择、功效分析

版本: 1.0.0(2026年3月)