shap

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- Adapted from: claude-scientific-skills/scientific-skills/shap -->
<!-- 改编自:claude-scientific-skills/scientific-skills/shap -->

SHAP Model Explainability

SHAP模型可解释性

Explain ML predictions using Shapley values - feature importance and attribution.
使用Shapley值解释机器学习预测——包含特征重要性与特征归因。

When to Use

适用场景

  • Explain why a model made specific predictions
  • Calculate feature importance with attribution
  • Debug model behavior and validate predictions
  • Create interpretability plots (waterfall, beeswarm, bar)
  • Analyze model fairness and bias
  • 解释模型为何做出特定预测
  • 计算带归因的特征重要性
  • 调试模型行为并验证预测结果
  • 创建可解释性可视化图(瀑布图、蜂群图、条形图)
  • 分析模型公平性与偏见

Quick Start

快速入门

python
import shap
import xgboost as xgb
python
import shap
import xgboost as xgb

Train model

Train model

model = xgb.XGBClassifier().fit(X_train, y_train)
model = xgb.XGBClassifier().fit(X_train, y_train)

Create explainer

Create explainer

explainer = shap.TreeExplainer(model)
explainer = shap.TreeExplainer(model)

Compute SHAP values

Compute SHAP values

shap_values = explainer(X_test)
shap_values = explainer(X_test)

Visualize

Visualize

shap.plots.beeswarm(shap_values)
undefined
shap.plots.beeswarm(shap_values)
undefined

Choose Explainer

选择解释器

python
undefined
python
undefined

Tree-based models (XGBoost, LightGBM, RF) - FAST

Tree-based models (XGBoost, LightGBM, RF) - FAST

explainer = shap.TreeExplainer(model)
explainer = shap.TreeExplainer(model)

Deep learning (TensorFlow, PyTorch)

Deep learning (TensorFlow, PyTorch)

explainer = shap.DeepExplainer(model, background_data)
explainer = shap.DeepExplainer(model, background_data)

Linear models

Linear models

explainer = shap.LinearExplainer(model, X_train)
explainer = shap.LinearExplainer(model, X_train)

Any model (slower but universal)

Any model (slower but universal)

explainer = shap.KernelExplainer(model.predict, X_train[:100])
explainer = shap.KernelExplainer(model.predict, X_train[:100])

Auto-select best explainer

Auto-select best explainer

explainer = shap.Explainer(model)
undefined
explainer = shap.Explainer(model)
undefined

Compute SHAP Values

计算SHAP值

python
undefined
python
undefined

Compute for test set

Compute for test set

shap_values = explainer(X_test)
shap_values = explainer(X_test)

Access components

Access components

shap_values.values # SHAP values (feature attributions) shap_values.base_values # Expected model output (baseline) shap_values.data # Original feature values
undefined
shap_values.values # SHAP values (feature attributions) shap_values.base_values # Expected model output (baseline) shap_values.data # Original feature values
undefined

Visualizations

可视化

Global Feature Importance

全局特征重要性

python
undefined
python
undefined

Beeswarm - shows distribution and importance

Beeswarm - shows distribution and importance

shap.plots.beeswarm(shap_values)
shap.plots.beeswarm(shap_values)

Bar - clean summary

Bar - clean summary

shap.plots.bar(shap_values)
undefined
shap.plots.bar(shap_values)
undefined

Individual Predictions

单个预测解释

python
undefined
python
undefined

Waterfall - breakdown of single prediction

Waterfall - breakdown of single prediction

shap.plots.waterfall(shap_values[0])
shap.plots.waterfall(shap_values[0])

Force - additive visualization

Force - additive visualization

shap.plots.force(shap_values[0])
undefined
shap.plots.force(shap_values[0])
undefined

Feature Relationships

特征关系

python
undefined
python
undefined

Scatter - feature vs SHAP value

Scatter - feature vs SHAP value

shap.plots.scatter(shap_values[:, "feature_name"])
shap.plots.scatter(shap_values[:, "feature_name"])

With interaction coloring

With interaction coloring

shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Income"])
undefined
shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Income"])
undefined

Heatmap (Multiple Samples)

热力图(多样本)

python
shap.plots.heatmap(shap_values[:100])
python
shap.plots.heatmap(shap_values[:100])

Common Patterns

常见模式

Complete Analysis

完整分析流程

python
import shap
python
import shap

1. Create explainer and compute

1. Create explainer and compute

explainer = shap.TreeExplainer(model) shap_values = explainer(X_test)
explainer = shap.TreeExplainer(model) shap_values = explainer(X_test)

2. Global importance

2. Global importance

shap.plots.beeswarm(shap_values)
shap.plots.beeswarm(shap_values)

3. Top feature relationships

3. Top feature relationships

shap.plots.scatter(shap_values[:, "top_feature"])
shap.plots.scatter(shap_values[:, "top_feature"])

4. Individual explanation

4. Individual explanation

shap.plots.waterfall(shap_values[0])
undefined
shap.plots.waterfall(shap_values[0])
undefined

Compare Groups

分组对比

python
undefined
python
undefined

Compare feature importance across groups

Compare feature importance across groups

group_a = X_test['category'] == 'A' group_b = X_test['category'] == 'B'
shap.plots.bar({ "Group A": shap_values[group_a], "Group B": shap_values[group_b] })
undefined
group_a = X_test['category'] == 'A' group_b = X_test['category'] == 'B'
shap.plots.bar({ "Group A": shap_values[group_a], "Group B": shap_values[group_b] })
undefined

Debug Errors

调试错误

python
undefined
python
undefined

Find misclassified samples

Find misclassified samples

errors = model.predict(X_test) != y_test error_idx = np.where(errors)[0]
errors = model.predict(X_test) != y_test error_idx = np.where(errors)[0]

Explain why they failed

Explain why they failed

for idx in error_idx[:5]: shap.plots.waterfall(shap_values[idx])
undefined
for idx in error_idx[:5]: shap.plots.waterfall(shap_values[idx])
undefined

Interpret Values

值的解读

  • Positive SHAP → Feature pushes prediction higher
  • Negative SHAP → Feature pushes prediction lower
  • Magnitude → Strength of impact
  • Sum of SHAP values = Prediction - Baseline
Baseline: 0.30
Age: +0.15
Income: +0.10
Education: -0.05
Prediction: 0.30 + 0.15 + 0.10 - 0.05 = 0.50
  • Positive SHAP → 特征推动预测值上升
  • Negative SHAP → 特征推动预测值下降
  • Magnitude → 影响强度
  • SHAP值之和 = 预测值 - 基准值
Baseline: 0.30
Age: +0.15
Income: +0.10
Education: -0.05
Prediction: 0.30 + 0.15 + 0.10 - 0.05 = 0.50

Best Practices

最佳实践

  1. Use TreeExplainer for tree models (fast, exact)
  2. Use 100-1000 background samples for KernelExplainer
  3. Start global (beeswarm) then go local (waterfall)
  4. Check model output type (probability vs log-odds)
  5. Validate with domain knowledge
  1. 使用TreeExplainer处理树模型(速度快、结果精确)
  2. 为KernelExplainer准备100-1000个背景样本
  3. 先全局分析(蜂群图)再局部分析(瀑布图)
  4. 检查模型输出类型(概率值 vs 对数几率)
  5. 结合领域知识验证结果

vs Alternatives

与替代工具对比

ToolBest For
SHAPTheoretically grounded, all model types
LIMEQuick local explanations
Feature ImportanceSimple tree-based importance
工具最佳适用场景
SHAP理论基础扎实,支持所有模型类型
LIME快速生成局部解释
Feature Importance简单的树模型特征重要性分析

Resources

参考资源