algo-rec-session

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Session-Based Recommendation

基于会话的推荐

Overview

概述

Session-based recommendation predicts the next item a user will interact with based on their current session's click/view sequence, without relying on long-term user profiles. Uses Markov chains, association rules, or neural approaches (GRU4Rec). Operates in real-time with O(sequence_length) inference.
基于会话的推荐会根据用户当前会话的点击/浏览序列,预测其接下来会交互的物品,无需依赖长期用户画像。可使用Markov chains、关联规则或神经网络方法(如GRU4Rec)。支持实时推理,时间复杂度为O(sequence_length)。

When to Use

使用场景

Trigger conditions:
  • Anonymous users (no login, no long-term profile)
  • Short browsing sessions where recency matters most
  • Real-time "next item" prediction during active sessions
When NOT to use:
  • When rich user history is available (use CF or content-based for better personalization)
  • When sessions are extremely short (1-2 clicks) — insufficient signal
触发条件:
  • 匿名用户(未登录,无长期用户画像)
  • 近期行为最为关键的短浏览会话
  • 活跃会话期间的实时“下一个物品”预测
不适用场景:
  • 拥有丰富用户历史数据时(使用CF或基于内容的推荐可实现更好的个性化)
  • 会话极短(仅1-2次点击)——信号不足

Algorithm

算法

IRON LAW: First Few Clicks Are Disproportionately Important
Session-based methods operate WITHOUT long-term profiles. Intent must
be inferred from SHORT sequences. The first 2-3 clicks establish the
session's intent — misreading early signals derails the entire session.
IRON LAW: First Few Clicks Are Disproportionately Important
Session-based methods operate WITHOUT long-term profiles. Intent must
be inferred from SHORT sequences. The first 2-3 clicks establish the
session's intent — misreading early signals derails the entire session.

Phase 1: Input Validation

阶段1:输入验证

Parse clickstream into sessions (by session ID or timeout-based splitting, typically 30min inactivity). Filter sessions below minimum length (3+ events). Gate: Sessions parsed, minimum length threshold applied.
将点击流解析为会话(通过会话ID或基于超时的拆分,通常为30分钟无活动)。过滤掉长度低于最小值的会话(需包含3个及以上事件)。 关卡: 会话已解析,且应用了最小长度阈值。

Phase 2: Core Algorithm

阶段2:核心算法

Markov Chain approach:
  1. Build transition matrix from item-to-item sequences across all sessions
  2. For current session [A, B, C], predict next item from P(next | C) or higher-order P(next | B, C)
Association Rules approach:
  1. Mine frequent item sequences (sequential pattern mining)
  2. Match current session suffix against known patterns
  3. Recommend items that frequently follow the matched pattern
Markov Chain方法:
  1. 基于所有会话中的物品到物品序列构建转移矩阵
  2. 对于当前会话[A, B, C],根据P(next | C)或更高阶的P(next | B, C)预测下一个物品
关联规则方法:
  1. 挖掘频繁物品序列(sequential pattern mining)
  2. 将当前会话的后缀与已知模式匹配
  3. 推荐频繁出现在匹配模式之后的物品

Phase 3: Verification

阶段3:验证

Evaluate with leave-one-out: hide last item in each session, predict, check hit rate and MRR (Mean Reciprocal Rank). Gate: Hit@20 significantly above random baseline.
使用留一法评估:隐藏每个会话的最后一个物品,进行预测,检查命中率和MRR(Mean Reciprocal Rank)。 关卡: Hit@20显著高于随机基线。

Phase 4: Output

阶段4:输出

Return ranked next-item predictions with confidence scores.
返回带有置信度得分的排序后的下一个物品预测结果。

Output Format

输出格式

json
{
  "predictions": [{"item_id": "789", "score": 0.65, "based_on": "last_3_clicks"}],
  "session": {"length": 5, "items_viewed": ["a", "b", "c", "d", "e"]},
  "metadata": {"method": "markov_order2", "hit_rate_at_20": 0.35}
}
json
{
  "predictions": [{"item_id": "789", "score": 0.65, "based_on": "last_3_clicks"}],
  "session": {"length": 5, "items_viewed": ["a", "b", "c", "d", "e"]},
  "metadata": {"method": "markov_order2", "hit_rate_at_20": 0.35}
}

Examples

示例

Sample I/O

输入输出示例

Input: Session: [shoes_page, running_shoes, nike_air_max] Expected: Recommend: nike_air_zoom (0.72), adidas_ultraboost (0.58), shoe_size_guide (0.41)
输入: 会话:[shoes_page, running_shoes, nike_air_max] 预期输出: 推荐:nike_air_zoom(0.72)、adidas_ultraboost(0.58)、shoe_size_guide(0.41)

Edge Cases

边缘情况

InputExpectedWhy
Session length = 1Popularity fallbackSingle click insufficient for sequence pattern
Repeated item viewsWeight recency, not countUser may be comparing, not broadening
Session intent shiftAdapt to latest clicksUser changed their goal mid-session
输入预期输出原因
会话长度 = 1基于热度的 fallback单次点击不足以提取序列模式
重复浏览同一物品权重偏向近期行为,而非点击次数用户可能在对比,而非拓展需求
会话意图转变适配最新的点击行为用户在会话中途改变了目标

Gotchas

注意事项

  • Session definition matters: 30-minute timeout is conventional but arbitrary. E-commerce may need shorter (15min); research browsing may need longer (60min).
  • Position bias: Users click top results more. Session data reflects UI position, not just preference. Correct for position bias.
  • Repeat recommendations: Users often revisit items. Distinguish "recommend something new" from "remind of previously viewed."
  • Cold start for new items: Items with zero prior session appearances can't be predicted by transition matrices. Mix in feature-based candidates.
  • Computational efficiency: For real-time inference, pre-compute transition probabilities. Recomputing per-request at scale is too slow.
  • 会话定义至关重要:30分钟超时是常规设定,但并非绝对。电商场景可能需要更短的超时(15分钟);研究浏览场景可能需要更长时间(60分钟)。
  • 位置偏差:用户更常点击顶部结果。会话数据反映的是UI位置,而非单纯的偏好。需修正位置偏差。
  • 重复推荐:用户经常会重新访问物品。要区分“推荐新物品”和“提醒用户之前浏览过的物品”。
  • 新物品冷启动:从未出现在任何会话中的物品无法通过转移矩阵预测。需混入基于特征的候选物品。
  • 计算效率:为实现实时推理,需预先计算转移概率。在大规模场景下,每次请求重新计算速度过慢。

References

参考资料

  • For GRU4Rec neural session model, see
    references/gru4rec.md
  • For session splitting heuristics, see
    references/session-splitting.md
  • 关于GRU4Rec神经会话模型,请查看
    references/gru4rec.md
  • 关于会话拆分启发式方法,请查看
    references/session-splitting.md