algo-hr-turnover

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Employee Turnover Prediction

员工流失预测

Overview

概述

Turnover prediction uses classification models (logistic regression, random forest, XGBoost) to estimate the probability an employee will leave within a defined period (typically 6-12 months). Features include tenure, compensation, performance, promotion history, and engagement signals.

流失预测使用分类模型（logistic regression、random forest、XGBoost）估算员工在特定时间段（通常为6-12个月）内离职的概率。特征包括任职年限、薪酬、绩效、晋升历史以及敬业度指标。

When to Use

使用场景

Trigger conditions:

Identifying employees at high risk of voluntary departure
Quantifying which factors drive turnover for targeted interventions
Prioritizing retention budgets toward highest-impact employees

When NOT to use:

For involuntary termination planning (different process and ethics)
When headcount is < 200 (insufficient data for reliable modeling)

触发条件：

识别自愿离职高风险员工
量化驱动流失的因素，以便开展针对性干预
将留存预算优先分配给影响最大的员工

不适用场景：

用于非自愿解雇规划（流程和伦理要求不同）
员工人数<200（数据不足，无法构建可靠模型）

Algorithm

算法

IRON LAW: Turnover Models Predict RISK, Not Certainty
A predicted 80% turnover probability means "employees with similar
profiles historically left 80% of the time." It does NOT mean this
specific employee WILL leave. Never use model outputs as sole basis
for employment decisions — that creates legal and ethical liability.

IRON LAW: Turnover Models Predict RISK, Not Certainty
A predicted 80% turnover probability means "employees with similar
profiles historically left 80% of the time." It does NOT mean this
specific employee WILL leave. Never use model outputs as sole basis
for employment decisions — that creates legal and ethical liability.

Phase 1: Input Validation

阶段1：输入验证

Collect: employee demographics, tenure, compensation (relative to market), last promotion date, performance ratings, manager change history, engagement survey scores, commute distance. Outcome: voluntary departure within N months. Gate: Minimum 200 turnover events, features available before departure date.

收集：员工人口统计信息、任职年限、薪酬（相对于市场水平）、上次晋升日期、绩效评级、经理变动历史、敬业度调查得分、通勤距离。预测目标：N个月内的自愿离职。 准入条件： 至少200个离职事件，且特征数据可在离职日期前获取。

Phase 2: Core Algorithm

阶段2：核心算法

Feature engineering: tenure buckets, comp ratio (salary/market median), time since last promotion, manager tenure, engagement trend
Handle class imbalance: turnover rate typically 10-20%. Use SMOTE or class weights.
Train: logistic regression (interpretable, HR-preferred) or GBDT (higher accuracy)
Output: probability of departure + top risk factors per employee

特征工程：任职年限分段、薪酬比率（薪资/市场中位数）、距上次晋升时长、经理任职年限、敬业度趋势
处理类别不平衡：流失率通常为10-20%，使用SMOTE或类别权重法
训练：logistic regression（可解释性强，HR偏好）或GBDT（准确率更高）
输出：离职概率 + 每位员工的主要风险因素

Phase 3: Verification

阶段3：验证

Evaluate: AUC, precision-recall (at actionable thresholds). Backtest: did the model correctly flag employees who left in the past 6 months? Gate: AUC > 0.70, precision > 50% at top decile.

评估指标：AUC、精确率-召回率（基于可行动阈值）。回测：模型是否正确标记了过去6个月内离职的员工？ 准入条件： AUC>0.70，前十分位精确率>50%。

Phase 4: Output

阶段4：输出

Return risk scores with driver analysis.

返回风险评分及驱动因素分析。

Output Format

输出格式

json

{
  "risk_scores": [{"employee_id": "E123", "turnover_prob": 0.72, "risk_tier": "high", "top_drivers": ["low_comp_ratio", "no_promotion_3yr"]}],
  "metadata": {"model": "xgboost", "auc": 0.78, "prediction_window_months": 12}
}

json

{
  "risk_scores": [{"employee_id": "E123", "turnover_prob": 0.72, "risk_tier": "high", "top_drivers": ["low_comp_ratio", "no_promotion_3yr"]}],
  "metadata": {"model": "xgboost", "auc": 0.78, "prediction_window_months": 12}
}

Examples

示例

Sample I/O

输入输出示例

Input: Employee: 4yr tenure, comp ratio 0.85, no promotion in 3yr, engagement score declining Expected: High risk (>0.6). Top drivers: below-market compensation, stalled career progression.

输入： 员工：4年任职年限，薪酬比率0.85，3年未晋升，敬业度得分下降 预期输出： 高风险（>0.6）。主要驱动因素：薪酬低于市场水平、职业发展停滞。

Edge Cases

边缘案例

Input	Expected	Why
New hire (< 6 months)	Unreliable prediction	Insufficient behavioral data
Top performer, high comp	Still could leave	Non-financial factors (manager, culture) matter
Post-reorg period	Model drift likely	Unusual conditions distort patterns

输入	预期结果	原因
新员工（<6个月）	预测结果不可靠	行为数据不足
顶级绩效员工，高薪	仍可能离职	非财务因素（经理、文化）同样重要
重组后时期	模型可能出现漂移	特殊情况会扭曲数据模式

Gotchas

注意事项

Survivorship bias: Training data only includes people who were hired and stayed long enough to observe. Early-stage leavers may be underrepresented.
Feature leakage: "Started job searching" or "updated LinkedIn" are strong predictors but ethically and legally problematic to use. Stick to internal HR data.
Self-fulfilling prophecy: If managers treat "high risk" employees differently (less investment, fewer projects), the model prediction becomes self-fulfilling.
Legal constraints: Using protected attributes (age, gender, ethnicity) directly or via proxies may violate employment law. Audit for disparate impact.
Retention intervention timing: Identifying risk is only useful if HR acts. Build the model into a retention workflow with specific intervention triggers.

幸存者偏差： 训练数据仅包含已入职且任职时间足够长的员工，早期离职者可能代表性不足。
特征泄露： "开始找工作"或"更新LinkedIn资料"是强预测因子，但使用此类数据存在伦理和法律问题，应仅使用内部HR数据。
自我实现预言： 如果经理对"高风险"员工区别对待（减少投入、分配更少项目），模型预测会成为自我实现的预言。
法律约束： 直接或间接使用受保护属性（年龄、性别、种族）可能违反雇佣法，需审核是否存在差异性影响。
留存干预时机： 识别风险只有在HR采取行动时才有意义，需将模型整合到留存工作流中，并设置具体的干预触发条件。

References

参考资料

For feature engineering from HR data, see
```
references/hr-features.md
```
For ethical AI in HR applications, see
```
references/ethical-hr-ai.md
```

关于HR数据的特征工程，详见
```
references/hr-features.md
```
关于HR应用中的伦理AI，详见
```
references/ethical-hr-ai.md
```