shap
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!-- Adapted from: claude-scientific-skills/scientific-skills/shap -->
<!-- 改编自:claude-scientific-skills/scientific-skills/shap -->
SHAP Model Explainability
SHAP模型可解释性
Explain ML predictions using Shapley values - feature importance and attribution.
使用Shapley值解释机器学习预测——包含特征重要性与特征归因。
When to Use
适用场景
- Explain why a model made specific predictions
- Calculate feature importance with attribution
- Debug model behavior and validate predictions
- Create interpretability plots (waterfall, beeswarm, bar)
- Analyze model fairness and bias
- 解释模型为何做出特定预测
- 计算带归因的特征重要性
- 调试模型行为并验证预测结果
- 创建可解释性可视化图(瀑布图、蜂群图、条形图)
- 分析模型公平性与偏见
Quick Start
快速入门
python
import shap
import xgboost as xgbpython
import shap
import xgboost as xgbTrain model
Train model
model = xgb.XGBClassifier().fit(X_train, y_train)
model = xgb.XGBClassifier().fit(X_train, y_train)
Create explainer
Create explainer
explainer = shap.TreeExplainer(model)
explainer = shap.TreeExplainer(model)
Compute SHAP values
Compute SHAP values
shap_values = explainer(X_test)
shap_values = explainer(X_test)
Visualize
Visualize
shap.plots.beeswarm(shap_values)
undefinedshap.plots.beeswarm(shap_values)
undefinedChoose Explainer
选择解释器
python
undefinedpython
undefinedTree-based models (XGBoost, LightGBM, RF) - FAST
Tree-based models (XGBoost, LightGBM, RF) - FAST
explainer = shap.TreeExplainer(model)
explainer = shap.TreeExplainer(model)
Deep learning (TensorFlow, PyTorch)
Deep learning (TensorFlow, PyTorch)
explainer = shap.DeepExplainer(model, background_data)
explainer = shap.DeepExplainer(model, background_data)
Linear models
Linear models
explainer = shap.LinearExplainer(model, X_train)
explainer = shap.LinearExplainer(model, X_train)
Any model (slower but universal)
Any model (slower but universal)
explainer = shap.KernelExplainer(model.predict, X_train[:100])
explainer = shap.KernelExplainer(model.predict, X_train[:100])
Auto-select best explainer
Auto-select best explainer
explainer = shap.Explainer(model)
undefinedexplainer = shap.Explainer(model)
undefinedCompute SHAP Values
计算SHAP值
python
undefinedpython
undefinedCompute for test set
Compute for test set
shap_values = explainer(X_test)
shap_values = explainer(X_test)
Access components
Access components
shap_values.values # SHAP values (feature attributions)
shap_values.base_values # Expected model output (baseline)
shap_values.data # Original feature values
undefinedshap_values.values # SHAP values (feature attributions)
shap_values.base_values # Expected model output (baseline)
shap_values.data # Original feature values
undefinedVisualizations
可视化
Global Feature Importance
全局特征重要性
python
undefinedpython
undefinedBeeswarm - shows distribution and importance
Beeswarm - shows distribution and importance
shap.plots.beeswarm(shap_values)
shap.plots.beeswarm(shap_values)
Bar - clean summary
Bar - clean summary
shap.plots.bar(shap_values)
undefinedshap.plots.bar(shap_values)
undefinedIndividual Predictions
单个预测解释
python
undefinedpython
undefinedWaterfall - breakdown of single prediction
Waterfall - breakdown of single prediction
shap.plots.waterfall(shap_values[0])
shap.plots.waterfall(shap_values[0])
Force - additive visualization
Force - additive visualization
shap.plots.force(shap_values[0])
undefinedshap.plots.force(shap_values[0])
undefinedFeature Relationships
特征关系
python
undefinedpython
undefinedScatter - feature vs SHAP value
Scatter - feature vs SHAP value
shap.plots.scatter(shap_values[:, "feature_name"])
shap.plots.scatter(shap_values[:, "feature_name"])
With interaction coloring
With interaction coloring
shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Income"])
undefinedshap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Income"])
undefinedHeatmap (Multiple Samples)
热力图(多样本)
python
shap.plots.heatmap(shap_values[:100])python
shap.plots.heatmap(shap_values[:100])Common Patterns
常见模式
Complete Analysis
完整分析流程
python
import shappython
import shap1. Create explainer and compute
1. Create explainer and compute
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
2. Global importance
2. Global importance
shap.plots.beeswarm(shap_values)
shap.plots.beeswarm(shap_values)
3. Top feature relationships
3. Top feature relationships
shap.plots.scatter(shap_values[:, "top_feature"])
shap.plots.scatter(shap_values[:, "top_feature"])
4. Individual explanation
4. Individual explanation
shap.plots.waterfall(shap_values[0])
undefinedshap.plots.waterfall(shap_values[0])
undefinedCompare Groups
分组对比
python
undefinedpython
undefinedCompare feature importance across groups
Compare feature importance across groups
group_a = X_test['category'] == 'A'
group_b = X_test['category'] == 'B'
shap.plots.bar({
"Group A": shap_values[group_a],
"Group B": shap_values[group_b]
})
undefinedgroup_a = X_test['category'] == 'A'
group_b = X_test['category'] == 'B'
shap.plots.bar({
"Group A": shap_values[group_a],
"Group B": shap_values[group_b]
})
undefinedDebug Errors
调试错误
python
undefinedpython
undefinedFind misclassified samples
Find misclassified samples
errors = model.predict(X_test) != y_test
error_idx = np.where(errors)[0]
errors = model.predict(X_test) != y_test
error_idx = np.where(errors)[0]
Explain why they failed
Explain why they failed
for idx in error_idx[:5]:
shap.plots.waterfall(shap_values[idx])
undefinedfor idx in error_idx[:5]:
shap.plots.waterfall(shap_values[idx])
undefinedInterpret Values
值的解读
- Positive SHAP → Feature pushes prediction higher
- Negative SHAP → Feature pushes prediction lower
- Magnitude → Strength of impact
- Sum of SHAP values = Prediction - Baseline
Baseline: 0.30
Age: +0.15
Income: +0.10
Education: -0.05
Prediction: 0.30 + 0.15 + 0.10 - 0.05 = 0.50- Positive SHAP → 特征推动预测值上升
- Negative SHAP → 特征推动预测值下降
- Magnitude → 影响强度
- SHAP值之和 = 预测值 - 基准值
Baseline: 0.30
Age: +0.15
Income: +0.10
Education: -0.05
Prediction: 0.30 + 0.15 + 0.10 - 0.05 = 0.50Best Practices
最佳实践
- Use TreeExplainer for tree models (fast, exact)
- Use 100-1000 background samples for KernelExplainer
- Start global (beeswarm) then go local (waterfall)
- Check model output type (probability vs log-odds)
- Validate with domain knowledge
- 使用TreeExplainer处理树模型(速度快、结果精确)
- 为KernelExplainer准备100-1000个背景样本
- 先全局分析(蜂群图)再局部分析(瀑布图)
- 检查模型输出类型(概率值 vs 对数几率)
- 结合领域知识验证结果
vs Alternatives
与替代工具对比
| Tool | Best For |
|---|---|
| SHAP | Theoretically grounded, all model types |
| LIME | Quick local explanations |
| Feature Importance | Simple tree-based importance |
| 工具 | 最佳适用场景 |
|---|---|
| SHAP | 理论基础扎实,支持所有模型类型 |
| LIME | 快速生成局部解释 |
| Feature Importance | 简单的树模型特征重要性分析 |
Resources
参考资源
- Docs: https://shap.readthedocs.io/
- Paper: Lundberg & Lee (2017) "A Unified Approach to Interpreting Model Predictions"
- 官方文档:https://shap.readthedocs.io/
- 论文:Lundberg & Lee (2017) 《A Unified Approach to Interpreting Model Predictions》