Interpretability and explainability of Machine Learning models

This article presents SHAP, a method of interpretation measuring the impact of variables on predictions Illustrated with Titanic data and an XGBoost algorithm, SHAP offers a local explanation but has disadvantages in terms of calculation time and random simulation. Alternatives, like LIME, are also available.

Background

Machine Learning now plays an important role in business decision-making processes because its level of precision has increased significantly in recent decades.

However, the opacity of Machine Learning (ML) and Deep Learning algorithms nowadays raise ethical and legal questions. The need for trust and transparency is therefore very present and very popular.

For these reasons, the interpretability and explainability of models are now an integral part of the job of Data Scientists who must strive to convince their customers and users of the acceptability of their model's reasoning. Fortunately for them, many methods now exist to facilitate this work, which even for a techie can be painstaking.

Definitions

First of all, before getting into the details, let's start by defining these terms to clearly distinguish their difference:

Interpretability is being able to understand how the model works by providing information about the machine learning model as well as the data used. Interpretability is dedicated to experts in ML or data.
Explainability consists in being able to explain why the model gave such a prediction by providing information in a complete semantic format that is accessible to a neophyte or tech-savvy user

Interpretability with SHAP

We will now focus more precisely on an interpretability method whose functioning and its positive and negative points we will explain: SHAP.

The SHAP method is based on game theory. Simply put, it makes it possible to measure the impact on the prediction of adding a variable (all other things being equal) through permutations of all possible options.

For example, if we want to estimate the effect of a person's gender on a prediction, we will test on each possible combination of the other variables the difference in the predictions between gender = man and gender = woman. Visually, the impact of the “cat ban” on the price of an apartment can be seen below.

As an output, we get a shapley value that represents the impact of a variable on the prediction. In the example above, a high ShapleY value will indicate that the value of the variable tends to increase the price of the apartment.

Applying SHAP to Titanic data

We will now show you how SHAP allows us to concretely interpret the results of an XGBoost algorithm.

This example will focus on Titanic data (https://www.kaggle.com/c/titanic/data), for which the aim is to predict surviving passengers using data concerning them with for example: Sex_female=gender, pClass=class in the boat, Fare=ticket price, age=age... The value to be predicted corresponds to the Survived variable, which takes the value 1 for the survival of the passenger.

The graph below shows all SHApley values per observation and per observation. The color of the points corresponds to the value of the variable and the horizontal positioning of the points corresponds to the shapley value.

Sex analysis

We can see, for example, that for the variable SEX_Female, if the value is 0 (blue dot, the passenger is a man), then the ShapleY values are negative (located on the left) therefore in disfavor of the Survived prediction (1).

On the contrary, if the value is 1 (red dot, the passenger is a woman), the ShapleY values are positive (located on the right) therefore in favor of the Survived prediction (1).

Conclusion : you are more likely to survive the Titanic shipwreck if you are a woman!

We find these results in the graph below where we clearly see the two modalities and their significant impact in the prediction.

‍

Analysis by age

On the following graph, we can clearly see that passengers under 10 years of age were the most likely to survive since it is when the x-axis is less than 10 that the SHApley values are the highest and therefore the most in favor of survival (since they increase the value of the prediction)

Crossing variables

It is also possible to cross variables. For example here, we want to see the impact on the prediction of the variable pClass and Sex_female. In ordinate we have the ShapleY values, on the x-axis the variable pClass and in color the variable Sex_female. What we can see is that class 1 has the most chances of survival, then comes the 2nd and finally the 3rd class. In addition to that, in the first and second classes, we see that women have the best chance of survival (in red) because their ShapleY values are above those of men. What is reversed in 3rd class (!)

‍

Explainability with SHAP

It is also possible to explain what were the drivers to lead to the final prediction for a particular individual. It is therefore a local explanation and not a global one.
In the graph below, we can see the impact of each of the characteristics of the chosen individual and how these characteristics impact the prediction. In blue the characteristics having a negative SHApley value therefore in favor of survival and in red the characteristics having a SHApley value positive therefore in favor of survival. These shapley values are additive. For the individual below, it can be seen that the fact that she was a woman increased her survival prediction, which was also supported by the fact that she was in first class.

Conclusion

To conclude on the SHAP method, we will take stock of the positive and negative points of this method:

SHAP benefits : The major advantage of the SHAP method is that it ensures that the prediction is fairly distributed among the variables, so it is a robust method (based on solid theory) and reliable if you want to do a model audit for example. In addition, the graphics are clean and legible making them perfect for use by a non-expert customer.

SHAP disadvantages : On the other hand, its main disadvantage is the calculation time, which increases exponentially with the number of variables available (2^k possible coalitions). This can be compensated by sampling, but it then increases the variance. The explanations deduced from SHAP are always under the condition of “knowing the remaining set of variables”, that is to say that the method systematically uses all the variables. The fact of not having an output model, unlike LIME for example, means that for a new observation, you have to restart the explanation. And finally, the fact that some values are simulated randomly can generate unrealistic points: for example age = 15 and major = Yes, because the simulations are random and do not take into account the correlations between the variables.

Before leaving us...

SHAP is not the only method that exists in this field. As mentioned earlier, LIME is also a method that uses features to explain model results. On the other hand, unlike the SHAP method, it consists in re-creating a linear model using simulated observations around the point of interest. As you can see in the graph below, other methods can be used depending on the structure of your project.

‍