algo-rank-bayesian

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bayesian Average Rating

Bayesian Average Rating

Overview

概述

Bayesian average combines an item's observed average rating with a prior (global average), weighted by review count. Formula: BR = (C × m + Σrᵢ) / (C + n) where m=global mean, C=confidence parameter, n=item reviews, Σrᵢ=sum of item ratings. Items with few reviews are pulled toward the global mean.
贝叶斯平均法将项目的观测平均评分与先验值(全局平均值)相结合,并以评论数量为权重。公式:BR = (C × m + Σrᵢ) / (C + n),其中m=全局均值,C=置信度参数,n=项目评论数,Σrᵢ=项目评分总和。评论数量少的项目会被拉向全局均值。

When to Use

使用场景

Trigger conditions:
  • Ranking items by continuous ratings (1-5 stars) with varying review counts
  • IMDB-style "Top 250" lists that balance quality and popularity
  • Any rating aggregation where new items shouldn't dominate with few high ratings
When NOT to use:
  • For binary (upvote/downvote) data (use Wilson Score instead)
  • When all items have similar review counts (simple average is sufficient)
触发条件:
  • 针对评论数量各异的连续评分(1-5星)项目进行排序
  • 构建IMDB风格的「Top 250」列表,平衡质量与受欢迎度
  • 任何评分聚合场景中,新出现的项目不应仅凭少量高评分就占据前列
不适用场景:
  • 二元(点赞/点踩)数据(应使用Wilson Score)
  • 所有项目评论数量相近的情况(简单平均已足够)

Algorithm

算法

IRON LAW: The Prior Protects Against Small-Sample Extremes
Without a prior, a single 5-star review makes an item "the best."
The Bayesian average adds C "phantom votes" at the global mean m,
shrinking small-sample items toward average. C controls shrinkage
strength: higher C = more conservative (more phantom votes).
Typical C = median review count across all items.
IRON LAW: The Prior Protects Against Small-Sample Extremes
Without a prior, a single 5-star review makes an item "the best."
The Bayesian average adds C "phantom votes" at the global mean m,
shrinking small-sample items toward average. C controls shrinkage
strength: higher C = more conservative (more phantom votes).
Typical C = median review count across all items.

Phase 1: Input Validation

阶段1:输入验证

Compute: global mean rating (m) across all items, choose C (phantom vote count). Collect per item: review count (n), average rating, or sum of ratings. Gate: m computed, C selected, item data available.
计算:所有项目的全局平均评分(m),选择C(虚拟投票数)。收集单个项目的数据:评论数(n)、平均评分或评分总和。 **验证门限:**已计算m,已选定C,项目数据可用。

Phase 2: Core Algorithm

阶段2:核心算法

  1. Global mean: m = Σ(all ratings) / Σ(all review counts)
  2. Bayesian average per item: BR = (C × m + n × avg_rating) / (C + n)
  3. Rank items by BR descending
  4. For items with n >> C, BR ≈ avg_rating (data dominates). For n << C, BR ≈ m (prior dominates).
  1. 全局均值:m = Σ(所有评分) / Σ(所有评论数)
  2. 单个项目的贝叶斯平均:BR = (C × m + n × avg_rating) / (C + n)
  3. 按BR降序对项目排序
  4. 当n远大于C时,BR≈avg_rating(数据占主导);当n远小于C时,BR≈m(先验值占主导)。

Phase 3: Verification

阶段3:验证

Check: items with very few reviews should be near global mean. Items with many reviews should be near their actual average. Ranking is intuitive. Gate: Shrinkage behavior confirmed, top items have both high ratings AND sufficient reviews.
检查:评论极少的项目评分应接近全局均值;评论多的项目评分应接近其实际平均值;排序结果符合直觉。 **验证门限:**确认收缩行为符合预期,排名靠前的项目兼具高评分与足够多的评论数。

Phase 4: Output

阶段4:输出

Return ranked items with Bayesian scores.
返回带有贝叶斯评分的排序后项目。

Output Format

输出格式

json
{
  "rankings": [{"item": "Movie_A", "bayesian_avg": 8.7, "raw_avg": 9.1, "reviews": 5000, "shrinkage": 0.04}],
  "metadata": {"global_mean": 6.8, "confidence_C": 500, "items_ranked": 10000}
}
json
{
  "rankings": [{"item": "Movie_A", "bayesian_avg": 8.7, "raw_avg": 9.1, "reviews": 5000, "shrinkage": 0.04}],
  "metadata": {"global_mean": 6.8, "confidence_C": 500, "items_ranked": 10000}
}

Examples

示例

Sample I/O

输入输出样例

Input: m=7.0, C=100. Item A: avg=9.5, n=5. Item B: avg=8.5, n=500. Expected: BR_A = (100×7 + 5×9.5)/(105) = 7.12. BR_B = (100×7 + 500×8.5)/(600) = 8.25. B ranks higher.
输入: m=7.0, C=100。项目A:平均评分=9.5,评论数=5。项目B:平均评分=8.5,评论数=500。 预期结果: BR_A = (100×7 + 5×9.5)/(105) = 7.12。BR_B = (100×7 + 500×8.5)/(600) = 8.25。项目B排名更高。

Edge Cases

边缘情况

InputExpectedWhy
n=0BR = m (global mean)No data, fully prior-driven
n=100000BR ≈ raw averageMassive sample overwhelms prior
All items same nEquivalent to simple average rankingUniform shrinkage, ordering preserved
输入预期结果原因
n=0BR = m(全局均值)无数据,完全由先验值主导
n=100000BR≈原始平均评分样本量极大,盖过先验值的影响
所有项目评论数相同等价于简单平均排序收缩效果一致,排序顺序不变

Gotchas

注意事项

  • C selection is subjective: Common choices: median review count, minimum reviews for "reliable" rating (IMDB uses top 25,000 voters with min votes). No universally correct value.
  • Rating scale matters: A 4.0 on a 5-point scale means something different than 4.0 on a 10-point scale. Normalize or use the same scale.
  • Category-specific priors: A 4.0 average in "horror movies" might be exceptional, while 4.0 in "Studio Ghibli" might be below average. Consider category-level priors.
  • Temporal bias: Old items accumulate reviews. Unless you weight recent reviews more, established items permanently dominate "top" lists.
  • Review gaming: Bayesian average doesn't prevent review manipulation — it only mitigates small-sample extremes. Pair with fraud detection.
  • C的选择具有主观性:常见选择包括评论数中位数、「可靠」评分所需的最低评论数(IMDB使用投票数不少于25000的前250名)。不存在普遍适用的标准值。
  • 评分尺度至关重要:5分制下的4.0与10分制下的4.0含义不同。需统一尺度或进行标准化处理。
  • 分类专属先验值:「恐怖电影」中的4.0平均评分可能属于优秀水平,而「吉卜力工作室」作品中的4.0可能低于平均水平。可考虑使用分类级别的先验值。
  • 时间偏差:老项目会积累更多评论。除非增加近期评论的权重,否则成熟项目会长期占据「顶级」列表。
  • 评论造假:贝叶斯平均法无法防止评论操纵——仅能缓解小样本极端值的影响。需搭配欺诈检测机制。

Scripts

脚本

ScriptDescriptionUsage
scripts/bayesian_avg.py
Rank items using Bayesian average to handle small-sample extremes
python scripts/bayesian_avg.py --help
Run
python scripts/bayesian_avg.py --verify
to execute built-in sanity tests.
脚本描述使用方式
scripts/bayesian_avg.py
使用贝叶斯平均法对项目排序,处理小样本极端值
python scripts/bayesian_avg.py --help
运行
python scripts/bayesian_avg.py --verify
执行内置的合理性测试。

References

参考资料

  • For IMDB weighted rating formula, see
    references/imdb-formula.md
  • For multi-dimensional Bayesian rating, see
    references/multi-dimensional.md
  • 关于IMDB加权评分公式,详见
    references/imdb-formula.md
  • 关于多维贝叶斯评分,详见
    references/multi-dimensional.md