sklearn-explainability

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

scikit-learn - Explainability & Interpretability

scikit-learn - 可解释性与模型解读

In scientific research, a model's "why" is as important as its "what". This guide focuses on tools that reveal the decision-making process of machine learning models, ensuring they are scientifically valid and not just overfitting on artifacts.

在科学研究中，模型的「决策原因」和「预测结果」同样重要。本指南聚焦于可揭示机器学习模型决策过程的工具，确保模型具备科学有效性，而非仅仅对数据噪声过拟合。

When to Use

适用场景

Validating that a model uses physically meaningful features (e.g., in drug discovery).
Identifying biases or "shortcuts" the model has learned from the training data.
Explaining individual predictions to non-experts (Local explanations).
Ranking the global impact of variables on a complex system (Global explanations).
Scientific auditing and regulatory compliance.

验证模型是否使用具备物理意义的特征（例如在药物研发场景中）。
识别模型从训练数据中学到的偏差或「捷径」。
向非专业人士解释单个预测结果（局部解释）。
对变量在复杂系统中的全局影响进行排序（全局解释）。
科学审计与合规性验证。

Core Principles

核心原则

1. Model-Specific vs. Model-Agnostic

1. 模型专属工具 vs 模型无关工具

Model-Specific: Tools like
```
feature_importances_
```
in Random Forests. Fast but tied to one architecture.
Model-Agnostic: Tools like SHAP or Permutation Importance. Work on any model (SVM, MLP, etc.) but are more compute-intensive.

模型专属工具：例如Random Forests中的
```
feature_importances_
```
。速度快但仅适用于特定模型架构。
模型无关工具：例如SHAP或置换重要性（Permutation Importance）。可适用于任何模型（如SVM、MLP等），但计算量更大。

2. Global vs. Local Explanations

2. 全局解释 vs 局部解释

Global: How does the feature "Temperature" affect the model overall?
Local: Why did the model predict "Reaction Failed" for this specific sample?

全局解释：特征「Temperature（温度）」对模型整体有何影响？
局部解释：模型为何对该特定样本预测「Reaction Failed（反应失败）」？

3. Feature Importance vs. Feature Contribution

3. 特征重要性 vs 特征贡献度

Importance tells you if a feature is used; Contribution tells you how it changed the output (positive or negative).

特征重要性告诉你模型是否使用了某一特征；而特征贡献度则告诉你该特征如何影响输出结果（正向或负向）。

Quick Reference: Built-in Inspection

快速参考：内置检查工具

python

from sklearn.inspection import permutation_importance, PartialDependenceDisplay

python

from sklearn.inspection import permutation_importance, PartialDependenceDisplay

1. Permutation Importance (Better than default tree importance)

result = permutation_importance(model, X_test, y_test, n_repeats=10) print(result.importances_mean)

2. Partial Dependence Plots (How one feature affects prediction)

PartialDependenceDisplay.from_estimator(model, X, features=['temp', 'pressure'])

undefined

PartialDependenceDisplay.from_estimator(model, X, features=['temp', 'pressure'])

undefined

Critical Rules

关键规则

✅ DO

✅ 建议做法

Prefer Permutation Importance over default RandomForest.feature_importances_ - Default importance is biased toward high-cardinality features (like unique IDs).
Use PartialDependenceDisplay - To visualize the relationship between a feature and the target (Linear, Exponential, or Sigmoid).
Scale Features before Interpretability - Many models (like Logistic Regression) require scaling for their coefficients (β) to be comparable.
Check Feature Correlations - If two features are highly correlated, importance will be split between them, making both look "less important" than they are.

优先使用置换重要性而非RandomForest默认的feature_importances_ - 默认的重要性计算会偏向高基数特征（如唯一ID）。
使用PartialDependenceDisplay - 可视化特征与目标变量之间的关系（线性、指数或Sigmoid关系）。
在进行可解释性分析前对特征做标准化处理 - 许多模型（如Logistic Regression）需要对特征标准化，才能让系数（β）具备可比性。
检查特征相关性 - 若两个特征高度相关，它们的重要性会被拆分，导致两者看起来比实际更「不重要」。

❌ DON'T

❌ 避免做法

Don't trust coefficients (β) of unregularized models - High variance in coefficients can lead to false conclusions about feature importance.
Don't use Feature Importance on Training Data - Always calculate it on the Test Set to see what features actually help with generalization.
Don't confuse Correlation with Causation - ML models show which features are predictive, not necessarily which ones are causative.

不要信任未正则化模型的系数（β） - 系数的高方差会导致关于特征重要性的错误结论。
不要在训练数据上计算特征重要性 - 务必在测试集上计算，以了解哪些特征真正有助于模型泛化。
不要混淆相关性与因果性 - 机器学习模型仅能展示哪些特征具备预测性，而非必然具备因果关系。

Interpretation Patterns

解读模式

1. SHAP Integration (The Gold Standard)

1. SHAP集成（黄金标准）

python

import shap

python

import shap

Works for any scikit-learn model

explainer = shap.Explainer(model.predict, X_test) shap_values = explainer(X_test)

Visualize global importance

shap.plots.bar(shap_values)

Visualize local explanation for the first sample

shap.plots.waterfall(shap_values[0])

undefined

shap.plots.waterfall(shap_values[0])

undefined

2. Partial Dependence (PDP) for Science

2. 面向科学研究的部分依赖图（PDP）

python

from sklearn.inspection import PartialDependenceDisplay

python

from sklearn.inspection import PartialDependenceDisplay

Check if the model learned the correct physical law

(e.g., does the reaction rate increase with temperature?)

fig, ax = plt.subplots(figsize=(8, 4)) PartialDependenceDisplay.from_estimator(model, X, [0, (0, 1)], ax=ax)

[0] is a 1D plot, [(0, 1)] is a 2D interaction plot

undefined

undefined

Advanced: Feature Contribution (ELI5 style)

进阶：特征贡献度（ELI5风格）

For a single prediction, see which features pushed it towards which class.

python

def explain_prediction(model, sample):
    # For linear models, this is: intercept + sum(coef * value)
    prediction = model.predict_proba(sample)
    # ... logic to map coefficients to feature names ...
    pass

针对单个预测结果，查看哪些特征推动模型做出该类别预测。

python

def explain_prediction(model, sample):
    # For linear models, this is: intercept + sum(coef * value)
    prediction = model.predict_proba(sample)
    # ... logic to map coefficients to feature names ...
    pass

Practical Workflows: Validating a Scientific Model

实用工作流：验证科学模型

Step 1: Detect "Leakage" Features

步骤1：检测「数据泄露」特征

If a feature has 99% importance and wasn't expected to, it's likely a data leak (e.g., a sample timestamp or ID).

若某一特征的重要性达到99%，但这不符合预期，那么它很可能是数据泄露特征（如样本时间戳或ID）。

Step 2: Stability Analysis

步骤2：稳定性分析

Run permutation importance with different random seeds. If the top features change significantly, the model is unstable and unreliable.

使用不同的随机种子运行置换重要性计算。若顶部特征发生显著变化，说明模型不稳定且不可靠。

Step 3: Interaction Check

步骤3：交互作用检查

Use 2D PDP to see if the model captured the interaction between features (e.g., Pressure only matters if Temperature > 100°C).

使用二维PDP查看模型是否捕捉到特征间的交互作用（例如，仅当Temperature（温度）>100°C时，Pressure（压力）才会产生影响）。

Common Pitfalls

常见陷阱

The "Default Importance" Bias

「默认重要性」偏差

In RandomForest, features with many categories (like Serial_Number) look very important because the tree can split on them many times.

python

undefined

在RandomForest中，具有多个类别的特征（如Serial_Number）看起来非常重要，因为树可以多次对其进行拆分。

python

undefined

✅ Solution: Use Permutation Importance on the test set instead.

undefined

undefined

Multicollinearity Ghosting

多重共线性幽灵

If Feature_A and Feature_B are 100% correlated, the model might only use one.

python

undefined

若Feature_A和Feature_B完全相关，模型可能只会使用其中一个。

python

undefined

✅ Solution: Use hierarchical clustering on features or check VIF

before interpreting importance.


Explainability turns Machine Learning into a true scientific tool. It allows researchers to move beyond the "Black Box" and extract new hypotheses directly from trained models.


可解释性将机器学习转化为真正的科学工具。它让研究人员能够突破「黑箱」限制，直接从训练好的模型中提取新的假设。