1 2.. _permutation_importance: 3 4Permutation feature importance 5============================== 6 7.. currentmodule:: sklearn.inspection 8 9Permutation feature importance is a model inspection technique that can be used 10for any :term:`fitted` :term:`estimator` when the data is tabular. This is 11especially useful for non-linear or opaque :term:`estimators`. The permutation 12feature importance is defined to be the decrease in a model score when a single 13feature value is randomly shuffled [1]_. This procedure breaks the relationship 14between the feature and the target, thus the drop in the model score is 15indicative of how much the model depends on the feature. This technique 16benefits from being model agnostic and can be calculated many times with 17different permutations of the feature. 18 19.. warning:: 20 21 Features that are deemed of **low importance for a bad model** (low 22 cross-validation score) could be **very important for a good model**. 23 Therefore it is always important to evaluate the predictive power of a model 24 using a held-out set (or better with cross-validation) prior to computing 25 importances. Permutation importance does not reflect to the intrinsic 26 predictive value of a feature by itself but **how important this feature is 27 for a particular model**. 28 29The :func:`permutation_importance` function calculates the feature importance 30of :term:`estimators` for a given dataset. The ``n_repeats`` parameter sets the 31number of times a feature is randomly shuffled and returns a sample of feature 32importances. 33 34Let's consider the following trained regression model:: 35 36 >>> from sklearn.datasets import load_diabetes 37 >>> from sklearn.model_selection import train_test_split 38 >>> from sklearn.linear_model import Ridge 39 >>> diabetes = load_diabetes() 40 >>> X_train, X_val, y_train, y_val = train_test_split( 41 ... diabetes.data, diabetes.target, random_state=0) 42 ... 43 >>> model = Ridge(alpha=1e-2).fit(X_train, y_train) 44 >>> model.score(X_val, y_val) 45 0.356... 46 47Its validation performance, measured via the :math:`R^2` score, is 48significantly larger than the chance level. This makes it possible to use the 49:func:`permutation_importance` function to probe which features are most 50predictive:: 51 52 >>> from sklearn.inspection import permutation_importance 53 >>> r = permutation_importance(model, X_val, y_val, 54 ... n_repeats=30, 55 ... random_state=0) 56 ... 57 >>> for i in r.importances_mean.argsort()[::-1]: 58 ... if r.importances_mean[i] - 2 * r.importances_std[i] > 0: 59 ... print(f"{diabetes.feature_names[i]:<8}" 60 ... f"{r.importances_mean[i]:.3f}" 61 ... f" +/- {r.importances_std[i]:.3f}") 62 ... 63 s5 0.204 +/- 0.050 64 bmi 0.176 +/- 0.048 65 bp 0.088 +/- 0.033 66 sex 0.056 +/- 0.023 67 68Note that the importance values for the top features represent a large 69fraction of the reference score of 0.356. 70 71Permutation importances can be computed either on the training set or on a 72held-out testing or validation set. Using a held-out set makes it possible to 73highlight which features contribute the most to the generalization power of the 74inspected model. Features that are important on the training set but not on the 75held-out set might cause the model to overfit. 76 77The permutation feature importance is the decrease in a model score when a single 78feature value is randomly shuffled. The score function to be used for the 79computation of importances can be specified with the `scoring` argument, 80which also accepts multiple scorers. Using multiple scorers is more computationally 81efficient than sequentially calling :func:`permutation_importance` several times 82with a different scorer, as it reuses model predictions. 83 84An example of using multiple scorers is shown below, employing a list of metrics, 85but more input formats are possible, as documented in :ref:`multimetric_scoring`. 86 87 >>> scoring = ['r2', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error'] 88 >>> r_multi = permutation_importance( 89 ... model, X_val, y_val, n_repeats=30, random_state=0, scoring=scoring) 90 ... 91 >>> for metric in r_multi: 92 ... print(f"{metric}") 93 ... r = r_multi[metric] 94 ... for i in r.importances_mean.argsort()[::-1]: 95 ... if r.importances_mean[i] - 2 * r.importances_std[i] > 0: 96 ... print(f" {diabetes.feature_names[i]:<8}" 97 ... f"{r.importances_mean[i]:.3f}" 98 ... f" +/- {r.importances_std[i]:.3f}") 99 ... 100 r2 101 s5 0.204 +/- 0.050 102 bmi 0.176 +/- 0.048 103 bp 0.088 +/- 0.033 104 sex 0.056 +/- 0.023 105 neg_mean_absolute_percentage_error 106 s5 0.081 +/- 0.020 107 bmi 0.064 +/- 0.015 108 bp 0.029 +/- 0.010 109 neg_mean_squared_error 110 s5 1013.903 +/- 246.460 111 bmi 872.694 +/- 240.296 112 bp 438.681 +/- 163.025 113 sex 277.382 +/- 115.126 114 115The ranking of the features is approximately the same for different metrics even 116if the scales of the importance values are very different. However, this is not 117guaranteed and different metrics might lead to significantly different feature 118importances, in particular for models trained for imbalanced classification problems, 119for which the choice of the classification metric can be critical. 120 121Outline of the permutation importance algorithm 122----------------------------------------------- 123 124- Inputs: fitted predictive model :math:`m`, tabular dataset (training or 125 validation) :math:`D`. 126- Compute the reference score :math:`s` of the model :math:`m` on data 127 :math:`D` (for instance the accuracy for a classifier or the :math:`R^2` for 128 a regressor). 129- For each feature :math:`j` (column of :math:`D`): 130 131 - For each repetition :math:`k` in :math:`{1, ..., K}`: 132 133 - Randomly shuffle column :math:`j` of dataset :math:`D` to generate a 134 corrupted version of the data named :math:`\tilde{D}_{k,j}`. 135 - Compute the score :math:`s_{k,j}` of model :math:`m` on corrupted data 136 :math:`\tilde{D}_{k,j}`. 137 138 - Compute importance :math:`i_j` for feature :math:`f_j` defined as: 139 140 .. math:: i_j = s - \frac{1}{K} \sum_{k=1}^{K} s_{k,j} 141 142Relation to impurity-based importance in trees 143---------------------------------------------- 144 145Tree-based models provide an alternative measure of :ref:`feature importances 146based on the mean decrease in impurity <random_forest_feature_importance>` 147(MDI). Impurity is quantified by the splitting criterion of the decision trees 148(Gini, Entropy or Mean Squared Error). However, this method can give high 149importance to features that may not be predictive on unseen data when the model 150is overfitting. Permutation-based feature importance, on the other hand, avoids 151this issue, since it can be computed on unseen data. 152 153Furthermore, impurity-based feature importance for trees are **strongly 154biased** and **favor high cardinality features** (typically numerical features) 155over low cardinality features such as binary features or categorical variables 156with a small number of possible categories. 157 158Permutation-based feature importances do not exhibit such a bias. Additionally, 159the permutation feature importance may be computed performance metric on the 160model predictions predictions and can be used to analyze any model class (not 161just tree-based models). 162 163The following example highlights the limitations of impurity-based feature 164importance in contrast to permutation-based feature importance: 165:ref:`sphx_glr_auto_examples_inspection_plot_permutation_importance.py`. 166 167Misleading values on strongly correlated features 168------------------------------------------------- 169 170When two features are correlated and one of the features is permuted, the model 171will still have access to the feature through its correlated feature. This will 172result in a lower importance value for both features, where they might 173*actually* be important. 174 175One way to handle this is to cluster features that are correlated and only 176keep one feature from each cluster. This strategy is explored in the following 177example: 178:ref:`sphx_glr_auto_examples_inspection_plot_permutation_importance_multicollinear.py`. 179 180.. topic:: Examples: 181 182 * :ref:`sphx_glr_auto_examples_inspection_plot_permutation_importance.py` 183 * :ref:`sphx_glr_auto_examples_inspection_plot_permutation_importance_multicollinear.py` 184 185.. topic:: References: 186 187 .. [1] L. Breiman, :doi:`"Random Forests" <10.1023/A:1010933404324>`, 188 Machine Learning, 45(1), 5-32, 2001. 189