algo-rank-bayesian

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Bayesian Average Rating

Overview

概述

Bayesian average combines an item's observed average rating with a prior (global average), weighted by review count. Formula: BR = (C × m + Σrᵢ) / (C + n) where m=global mean, C=confidence parameter, n=item reviews, Σrᵢ=sum of item ratings. Items with few reviews are pulled toward the global mean.

贝叶斯平均法将项目的观测平均评分与先验值（全局平均值）相结合，并以评论数量为权重。公式：BR = (C × m + Σrᵢ) / (C + n)，其中m=全局均值，C=置信度参数，n=项目评论数，Σrᵢ=项目评分总和。评论数量少的项目会被拉向全局均值。

When to Use

使用场景

Trigger conditions:

Ranking items by continuous ratings (1-5 stars) with varying review counts
IMDB-style "Top 250" lists that balance quality and popularity
Any rating aggregation where new items shouldn't dominate with few high ratings

When NOT to use:

For binary (upvote/downvote) data (use Wilson Score instead)
When all items have similar review counts (simple average is sufficient)

触发条件：

针对评论数量各异的连续评分（1-5星）项目进行排序
构建IMDB风格的「Top 250」列表，平衡质量与受欢迎度
任何评分聚合场景中，新出现的项目不应仅凭少量高评分就占据前列

不适用场景：

二元（点赞/点踩）数据（应使用Wilson Score）
所有项目评论数量相近的情况（简单平均已足够）

Algorithm

算法

IRON LAW: The Prior Protects Against Small-Sample Extremes
Without a prior, a single 5-star review makes an item "the best."
The Bayesian average adds C "phantom votes" at the global mean m,
shrinking small-sample items toward average. C controls shrinkage
strength: higher C = more conservative (more phantom votes).
Typical C = median review count across all items.

IRON LAW: The Prior Protects Against Small-Sample Extremes
Without a prior, a single 5-star review makes an item "the best."
The Bayesian average adds C "phantom votes" at the global mean m,
shrinking small-sample items toward average. C controls shrinkage
strength: higher C = more conservative (more phantom votes).
Typical C = median review count across all items.

Phase 1: Input Validation

阶段1：输入验证

Compute: global mean rating (m) across all items, choose C (phantom vote count). Collect per item: review count (n), average rating, or sum of ratings. Gate: m computed, C selected, item data available.

计算：所有项目的全局平均评分(m)，选择C（虚拟投票数）。收集单个项目的数据：评论数(n)、平均评分或评分总和。 **验证门限：**已计算m，已选定C，项目数据可用。

Phase 2: Core Algorithm

阶段2：核心算法

Global mean: m = Σ(all ratings) / Σ(all review counts)
Bayesian average per item: BR = (C × m + n × avg_rating) / (C + n)
Rank items by BR descending
For items with n >> C, BR ≈ avg_rating (data dominates). For n << C, BR ≈ m (prior dominates).

全局均值：m = Σ(所有评分) / Σ(所有评论数)
单个项目的贝叶斯平均：BR = (C × m + n × avg_rating) / (C + n)
按BR降序对项目排序
当n远大于C时，BR≈avg_rating（数据占主导）；当n远小于C时，BR≈m（先验值占主导）。

Phase 3: Verification

阶段3：验证

Check: items with very few reviews should be near global mean. Items with many reviews should be near their actual average. Ranking is intuitive. Gate: Shrinkage behavior confirmed, top items have both high ratings AND sufficient reviews.

检查：评论极少的项目评分应接近全局均值；评论多的项目评分应接近其实际平均值；排序结果符合直觉。 **验证门限：**确认收缩行为符合预期，排名靠前的项目兼具高评分与足够多的评论数。

Phase 4: Output

阶段4：输出

Return ranked items with Bayesian scores.

返回带有贝叶斯评分的排序后项目。

Output Format

输出格式

json

{
  "rankings": [{"item": "Movie_A", "bayesian_avg": 8.7, "raw_avg": 9.1, "reviews": 5000, "shrinkage": 0.04}],
  "metadata": {"global_mean": 6.8, "confidence_C": 500, "items_ranked": 10000}
}

json

{
  "rankings": [{"item": "Movie_A", "bayesian_avg": 8.7, "raw_avg": 9.1, "reviews": 5000, "shrinkage": 0.04}],
  "metadata": {"global_mean": 6.8, "confidence_C": 500, "items_ranked": 10000}
}

Examples

示例

Sample I/O

输入输出样例

Input: m=7.0, C=100. Item A: avg=9.5, n=5. Item B: avg=8.5, n=500. Expected: BR_A = (100×7 + 5×9.5)/(105) = 7.12. BR_B = (100×7 + 500×8.5)/(600) = 8.25. B ranks higher.

输入： m=7.0, C=100。项目A：平均评分=9.5，评论数=5。项目B：平均评分=8.5，评论数=500。 预期结果： BR_A = (100×7 + 5×9.5)/(105) = 7.12。BR_B = (100×7 + 500×8.5)/(600) = 8.25。项目B排名更高。

Edge Cases

边缘情况

Input	Expected	Why
n=0	BR = m (global mean)	No data, fully prior-driven
n=100000	BR ≈ raw average	Massive sample overwhelms prior
All items same n	Equivalent to simple average ranking	Uniform shrinkage, ordering preserved

输入	预期结果	原因
n=0	BR = m（全局均值）	无数据，完全由先验值主导
n=100000	BR≈原始平均评分	样本量极大，盖过先验值的影响
所有项目评论数相同	等价于简单平均排序	收缩效果一致，排序顺序不变

Gotchas

注意事项

C selection is subjective: Common choices: median review count, minimum reviews for "reliable" rating (IMDB uses top 25,000 voters with min votes). No universally correct value.
Rating scale matters: A 4.0 on a 5-point scale means something different than 4.0 on a 10-point scale. Normalize or use the same scale.
Category-specific priors: A 4.0 average in "horror movies" might be exceptional, while 4.0 in "Studio Ghibli" might be below average. Consider category-level priors.
Temporal bias: Old items accumulate reviews. Unless you weight recent reviews more, established items permanently dominate "top" lists.
Review gaming: Bayesian average doesn't prevent review manipulation — it only mitigates small-sample extremes. Pair with fraud detection.

C的选择具有主观性：常见选择包括评论数中位数、「可靠」评分所需的最低评论数（IMDB使用投票数不少于25000的前250名）。不存在普遍适用的标准值。
评分尺度至关重要：5分制下的4.0与10分制下的4.0含义不同。需统一尺度或进行标准化处理。
分类专属先验值：「恐怖电影」中的4.0平均评分可能属于优秀水平，而「吉卜力工作室」作品中的4.0可能低于平均水平。可考虑使用分类级别的先验值。
时间偏差：老项目会积累更多评论。除非增加近期评论的权重，否则成熟项目会长期占据「顶级」列表。
评论造假：贝叶斯平均法无法防止评论操纵——仅能缓解小样本极端值的影响。需搭配欺诈检测机制。

Scripts

脚本

Script	Description	Usage
`scripts/bayesian_avg.py`	Rank items using Bayesian average to handle small-sample extremes	`python scripts/bayesian_avg.py --help`

Run

python scripts/bayesian_avg.py --verify

to execute built-in sanity tests.

脚本	描述	使用方式
`scripts/bayesian_avg.py`	使用贝叶斯平均法对项目排序，处理小样本极端值	`python scripts/bayesian_avg.py --help`

运行

python scripts/bayesian_avg.py --verify

执行内置的合理性测试。

References

参考资料

For IMDB weighted rating formula, see
```
references/imdb-formula.md
```
For multi-dimensional Bayesian rating, see
```
references/multi-dimensional.md
```

关于IMDB加权评分公式，详见
```
references/imdb-formula.md
```
关于多维贝叶斯评分，详见
```
references/multi-dimensional.md
```