algo-hr-turnover

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Employee Turnover Prediction

员工流失预测

Overview

概述

Turnover prediction uses classification models (logistic regression, random forest, XGBoost) to estimate the probability an employee will leave within a defined period (typically 6-12 months). Features include tenure, compensation, performance, promotion history, and engagement signals.
流失预测使用分类模型(logistic regression、random forest、XGBoost)估算员工在特定时间段(通常为6-12个月)内离职的概率。特征包括任职年限、薪酬、绩效、晋升历史以及敬业度指标。

When to Use

使用场景

Trigger conditions:
  • Identifying employees at high risk of voluntary departure
  • Quantifying which factors drive turnover for targeted interventions
  • Prioritizing retention budgets toward highest-impact employees
When NOT to use:
  • For involuntary termination planning (different process and ethics)
  • When headcount is < 200 (insufficient data for reliable modeling)
触发条件:
  • 识别自愿离职高风险员工
  • 量化驱动流失的因素,以便开展针对性干预
  • 将留存预算优先分配给影响最大的员工
不适用场景:
  • 用于非自愿解雇规划(流程和伦理要求不同)
  • 员工人数<200(数据不足,无法构建可靠模型)

Algorithm

算法

IRON LAW: Turnover Models Predict RISK, Not Certainty
A predicted 80% turnover probability means "employees with similar
profiles historically left 80% of the time." It does NOT mean this
specific employee WILL leave. Never use model outputs as sole basis
for employment decisions — that creates legal and ethical liability.
IRON LAW: Turnover Models Predict RISK, Not Certainty
A predicted 80% turnover probability means "employees with similar
profiles historically left 80% of the time." It does NOT mean this
specific employee WILL leave. Never use model outputs as sole basis
for employment decisions — that creates legal and ethical liability.

Phase 1: Input Validation

阶段1:输入验证

Collect: employee demographics, tenure, compensation (relative to market), last promotion date, performance ratings, manager change history, engagement survey scores, commute distance. Outcome: voluntary departure within N months. Gate: Minimum 200 turnover events, features available before departure date.
收集:员工人口统计信息、任职年限、薪酬(相对于市场水平)、上次晋升日期、绩效评级、经理变动历史、敬业度调查得分、通勤距离。预测目标:N个月内的自愿离职。 准入条件: 至少200个离职事件,且特征数据可在离职日期前获取。

Phase 2: Core Algorithm

阶段2:核心算法

  1. Feature engineering: tenure buckets, comp ratio (salary/market median), time since last promotion, manager tenure, engagement trend
  2. Handle class imbalance: turnover rate typically 10-20%. Use SMOTE or class weights.
  3. Train: logistic regression (interpretable, HR-preferred) or GBDT (higher accuracy)
  4. Output: probability of departure + top risk factors per employee
  1. 特征工程:任职年限分段、薪酬比率(薪资/市场中位数)、距上次晋升时长、经理任职年限、敬业度趋势
  2. 处理类别不平衡:流失率通常为10-20%,使用SMOTE或类别权重法
  3. 训练:logistic regression(可解释性强,HR偏好)或GBDT(准确率更高)
  4. 输出:离职概率 + 每位员工的主要风险因素

Phase 3: Verification

阶段3:验证

Evaluate: AUC, precision-recall (at actionable thresholds). Backtest: did the model correctly flag employees who left in the past 6 months? Gate: AUC > 0.70, precision > 50% at top decile.
评估指标:AUC、精确率-召回率(基于可行动阈值)。回测:模型是否正确标记了过去6个月内离职的员工? 准入条件: AUC>0.70,前十分位精确率>50%。

Phase 4: Output

阶段4:输出

Return risk scores with driver analysis.
返回风险评分及驱动因素分析。

Output Format

输出格式

json
{
  "risk_scores": [{"employee_id": "E123", "turnover_prob": 0.72, "risk_tier": "high", "top_drivers": ["low_comp_ratio", "no_promotion_3yr"]}],
  "metadata": {"model": "xgboost", "auc": 0.78, "prediction_window_months": 12}
}
json
{
  "risk_scores": [{"employee_id": "E123", "turnover_prob": 0.72, "risk_tier": "high", "top_drivers": ["low_comp_ratio", "no_promotion_3yr"]}],
  "metadata": {"model": "xgboost", "auc": 0.78, "prediction_window_months": 12}
}

Examples

示例

Sample I/O

输入输出示例

Input: Employee: 4yr tenure, comp ratio 0.85, no promotion in 3yr, engagement score declining Expected: High risk (>0.6). Top drivers: below-market compensation, stalled career progression.
输入: 员工:4年任职年限,薪酬比率0.85,3年未晋升,敬业度得分下降 预期输出: 高风险(>0.6)。主要驱动因素:薪酬低于市场水平、职业发展停滞。

Edge Cases

边缘案例

InputExpectedWhy
New hire (< 6 months)Unreliable predictionInsufficient behavioral data
Top performer, high compStill could leaveNon-financial factors (manager, culture) matter
Post-reorg periodModel drift likelyUnusual conditions distort patterns
输入预期结果原因
新员工(<6个月)预测结果不可靠行为数据不足
顶级绩效员工,高薪仍可能离职非财务因素(经理、文化)同样重要
重组后时期模型可能出现漂移特殊情况会扭曲数据模式

Gotchas

注意事项

  • Survivorship bias: Training data only includes people who were hired and stayed long enough to observe. Early-stage leavers may be underrepresented.
  • Feature leakage: "Started job searching" or "updated LinkedIn" are strong predictors but ethically and legally problematic to use. Stick to internal HR data.
  • Self-fulfilling prophecy: If managers treat "high risk" employees differently (less investment, fewer projects), the model prediction becomes self-fulfilling.
  • Legal constraints: Using protected attributes (age, gender, ethnicity) directly or via proxies may violate employment law. Audit for disparate impact.
  • Retention intervention timing: Identifying risk is only useful if HR acts. Build the model into a retention workflow with specific intervention triggers.
  • 幸存者偏差: 训练数据仅包含已入职且任职时间足够长的员工,早期离职者可能代表性不足。
  • 特征泄露: "开始找工作"或"更新LinkedIn资料"是强预测因子,但使用此类数据存在伦理和法律问题,应仅使用内部HR数据。
  • 自我实现预言: 如果经理对"高风险"员工区别对待(减少投入、分配更少项目),模型预测会成为自我实现的预言。
  • 法律约束: 直接或间接使用受保护属性(年龄、性别、种族)可能违反雇佣法,需审核是否存在差异性影响。
  • 留存干预时机: 识别风险只有在HR采取行动时才有意义,需将模型整合到留存工作流中,并设置具体的干预触发条件。

References

参考资料

  • For feature engineering from HR data, see
    references/hr-features.md
  • For ethical AI in HR applications, see
    references/ethical-hr-ai.md
  • 关于HR数据的特征工程,详见
    references/hr-features.md
  • 关于HR应用中的伦理AI,详见
    references/ethical-hr-ai.md