algo-rank-bayesian
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBayesian Average Rating
Bayesian Average Rating
Overview
概述
Bayesian average combines an item's observed average rating with a prior (global average), weighted by review count. Formula: BR = (C × m + Σrᵢ) / (C + n) where m=global mean, C=confidence parameter, n=item reviews, Σrᵢ=sum of item ratings. Items with few reviews are pulled toward the global mean.
贝叶斯平均法将项目的观测平均评分与先验值(全局平均值)相结合,并以评论数量为权重。公式:BR = (C × m + Σrᵢ) / (C + n),其中m=全局均值,C=置信度参数,n=项目评论数,Σrᵢ=项目评分总和。评论数量少的项目会被拉向全局均值。
When to Use
使用场景
Trigger conditions:
- Ranking items by continuous ratings (1-5 stars) with varying review counts
- IMDB-style "Top 250" lists that balance quality and popularity
- Any rating aggregation where new items shouldn't dominate with few high ratings
When NOT to use:
- For binary (upvote/downvote) data (use Wilson Score instead)
- When all items have similar review counts (simple average is sufficient)
触发条件:
- 针对评论数量各异的连续评分(1-5星)项目进行排序
- 构建IMDB风格的「Top 250」列表,平衡质量与受欢迎度
- 任何评分聚合场景中,新出现的项目不应仅凭少量高评分就占据前列
不适用场景:
- 二元(点赞/点踩)数据(应使用Wilson Score)
- 所有项目评论数量相近的情况(简单平均已足够)
Algorithm
算法
IRON LAW: The Prior Protects Against Small-Sample Extremes
Without a prior, a single 5-star review makes an item "the best."
The Bayesian average adds C "phantom votes" at the global mean m,
shrinking small-sample items toward average. C controls shrinkage
strength: higher C = more conservative (more phantom votes).
Typical C = median review count across all items.IRON LAW: The Prior Protects Against Small-Sample Extremes
Without a prior, a single 5-star review makes an item "the best."
The Bayesian average adds C "phantom votes" at the global mean m,
shrinking small-sample items toward average. C controls shrinkage
strength: higher C = more conservative (more phantom votes).
Typical C = median review count across all items.Phase 1: Input Validation
阶段1:输入验证
Compute: global mean rating (m) across all items, choose C (phantom vote count). Collect per item: review count (n), average rating, or sum of ratings.
Gate: m computed, C selected, item data available.
计算:所有项目的全局平均评分(m),选择C(虚拟投票数)。收集单个项目的数据:评论数(n)、平均评分或评分总和。
**验证门限:**已计算m,已选定C,项目数据可用。
Phase 2: Core Algorithm
阶段2:核心算法
- Global mean: m = Σ(all ratings) / Σ(all review counts)
- Bayesian average per item: BR = (C × m + n × avg_rating) / (C + n)
- Rank items by BR descending
- For items with n >> C, BR ≈ avg_rating (data dominates). For n << C, BR ≈ m (prior dominates).
- 全局均值:m = Σ(所有评分) / Σ(所有评论数)
- 单个项目的贝叶斯平均:BR = (C × m + n × avg_rating) / (C + n)
- 按BR降序对项目排序
- 当n远大于C时,BR≈avg_rating(数据占主导);当n远小于C时,BR≈m(先验值占主导)。
Phase 3: Verification
阶段3:验证
Check: items with very few reviews should be near global mean. Items with many reviews should be near their actual average. Ranking is intuitive.
Gate: Shrinkage behavior confirmed, top items have both high ratings AND sufficient reviews.
检查:评论极少的项目评分应接近全局均值;评论多的项目评分应接近其实际平均值;排序结果符合直觉。
**验证门限:**确认收缩行为符合预期,排名靠前的项目兼具高评分与足够多的评论数。
Phase 4: Output
阶段4:输出
Return ranked items with Bayesian scores.
返回带有贝叶斯评分的排序后项目。
Output Format
输出格式
json
{
"rankings": [{"item": "Movie_A", "bayesian_avg": 8.7, "raw_avg": 9.1, "reviews": 5000, "shrinkage": 0.04}],
"metadata": {"global_mean": 6.8, "confidence_C": 500, "items_ranked": 10000}
}json
{
"rankings": [{"item": "Movie_A", "bayesian_avg": 8.7, "raw_avg": 9.1, "reviews": 5000, "shrinkage": 0.04}],
"metadata": {"global_mean": 6.8, "confidence_C": 500, "items_ranked": 10000}
}Examples
示例
Sample I/O
输入输出样例
Input: m=7.0, C=100. Item A: avg=9.5, n=5. Item B: avg=8.5, n=500.
Expected: BR_A = (100×7 + 5×9.5)/(105) = 7.12. BR_B = (100×7 + 500×8.5)/(600) = 8.25. B ranks higher.
输入: m=7.0, C=100。项目A:平均评分=9.5,评论数=5。项目B:平均评分=8.5,评论数=500。
预期结果: BR_A = (100×7 + 5×9.5)/(105) = 7.12。BR_B = (100×7 + 500×8.5)/(600) = 8.25。项目B排名更高。
Edge Cases
边缘情况
| Input | Expected | Why |
|---|---|---|
| n=0 | BR = m (global mean) | No data, fully prior-driven |
| n=100000 | BR ≈ raw average | Massive sample overwhelms prior |
| All items same n | Equivalent to simple average ranking | Uniform shrinkage, ordering preserved |
| 输入 | 预期结果 | 原因 |
|---|---|---|
| n=0 | BR = m(全局均值) | 无数据,完全由先验值主导 |
| n=100000 | BR≈原始平均评分 | 样本量极大,盖过先验值的影响 |
| 所有项目评论数相同 | 等价于简单平均排序 | 收缩效果一致,排序顺序不变 |
Gotchas
注意事项
- C selection is subjective: Common choices: median review count, minimum reviews for "reliable" rating (IMDB uses top 25,000 voters with min votes). No universally correct value.
- Rating scale matters: A 4.0 on a 5-point scale means something different than 4.0 on a 10-point scale. Normalize or use the same scale.
- Category-specific priors: A 4.0 average in "horror movies" might be exceptional, while 4.0 in "Studio Ghibli" might be below average. Consider category-level priors.
- Temporal bias: Old items accumulate reviews. Unless you weight recent reviews more, established items permanently dominate "top" lists.
- Review gaming: Bayesian average doesn't prevent review manipulation — it only mitigates small-sample extremes. Pair with fraud detection.
- C的选择具有主观性:常见选择包括评论数中位数、「可靠」评分所需的最低评论数(IMDB使用投票数不少于25000的前250名)。不存在普遍适用的标准值。
- 评分尺度至关重要:5分制下的4.0与10分制下的4.0含义不同。需统一尺度或进行标准化处理。
- 分类专属先验值:「恐怖电影」中的4.0平均评分可能属于优秀水平,而「吉卜力工作室」作品中的4.0可能低于平均水平。可考虑使用分类级别的先验值。
- 时间偏差:老项目会积累更多评论。除非增加近期评论的权重,否则成熟项目会长期占据「顶级」列表。
- 评论造假:贝叶斯平均法无法防止评论操纵——仅能缓解小样本极端值的影响。需搭配欺诈检测机制。
Scripts
脚本
| Script | Description | Usage |
|---|---|---|
| Rank items using Bayesian average to handle small-sample extremes | |
Run to execute built-in sanity tests.
python scripts/bayesian_avg.py --verify| 脚本 | 描述 | 使用方式 |
|---|---|---|
| 使用贝叶斯平均法对项目排序,处理小样本极端值 | |
运行 执行内置的合理性测试。
python scripts/bayesian_avg.py --verifyReferences
参考资料
- For IMDB weighted rating formula, see
references/imdb-formula.md - For multi-dimensional Bayesian rating, see
references/multi-dimensional.md
- 关于IMDB加权评分公式,详见
references/imdb-formula.md - 关于多维贝叶斯评分,详见
references/multi-dimensional.md