MLOps Blog

ML Model Interpretation Tools: What, Why, and How to Interpret

11 min
18th August, 2023

Interpretation is literally defined as explaining or showing your own understanding of something. 

When you create an ML model, which is nothing but an algorithm that can learn patterns, it might feel like a black box to other project stakeholders. Sometimes even to you. 

Which is why we have model interpretation tools.

What is Model Interpretation?

In general, an ML model has to obtain predictions, and use those predictions and eventual insights to solve a range of problems. Already, we can ask a couple of follow-up questions:

  • How trustworthy are these predictions? 
  • Are they reliable enough to make big decisions?

Model Interpretation redirects your focus from ‘what was the conclusion?’ to ‘why was this conclusion reached?’. You can get an understanding of the model’s decision-making process, i.e. what exactly drives the model to classify a data point correctly or incorrectly. 

Why is Model Interpretation important?

Consider an example of a husky versus wolf (dog breed) classifier, in which a few huskies were misclassified as wolves. Using interpretable machine learning, you might find that these misclassifications mainly happened because of snow in the image, which the classifier was using as a feature to predict wolves. 

It’s a simple example, but already you can see why Model Interpretation is important. It helps your model in at least a few aspects:

  • Fairness – An interpretable model used by a company to decide raises and promotions can tell you exactly why any particular person was, or wasn’t offered a promotion.
  • Reliability – Small changes in input won’t lead to a domino effect and alter the output drastically.
  • Causality – Only causal relationships are useful for decision making.
  • Trust – It’s easier for all project stakeholders, especially on the non-technical side, to trust a model that can be explained in layman’s terms.

How to interpret an ML model?

Machine Learning models vary in degrees of complexity and performance. One size doesn’t fit them all. As a result, there are different ways to interpret them. Primarily, these methods can be categorized as: 

  1. Model-specific / Model-agnostic
    • Model-specific methods are specific to certain models, they depend on the inner machinery of a model to make certain conclusions. These methods may include the interpretation of coefficient weights in Generalized Linear Models (GLMs), or weights and biases in the case of Neural Networks.
    • Model-agnostic methods can be used on any model. They’re generally applied post-training. They usually work by analyzing the relationship between feature input-output pairs and don’t have access to the model’s internal mechanics such as weights or assumptions.
  2. Local / Global scope
    • The local scope covers only an individual prediction, capturing the reasons behind only the specified prediction.
    • The global scope extends beyond an individual data point and covers the model’s general behavior.

Let’s create a model to interpret. We’ll do a short walkthrough of the model creation steps, and then we’ll focus on different model-agnostic tools and frameworks to interpret the created model, rather than solve the actual problem.

Model creation

1. Loading the dataset

Model interpretation_dataset

Dataset schema:



Type of Social Media post


Domain of Social Media post


Complete URL of post


Date of Post


Timestamp in ET

Time (GMT)

Timestamp in GMT


Title of the post


Actual text conversation of the post

To predict:

(1=Patient, 0=Non-Patient)

2. Performing Exploratory Data Analysis & data pre-processing

  1. Filling the null values.
  2. Dropping redundant features like Time (GMT).
  3. Cleaning the text data by removing everything alphanumeric characters.
  4. Label Encoding the categorically valued attributes
  5. Handling erroneous values present in certain attributes
  6. Lemmatization & Tf-Idf vectorization of text-based features.

Read also

Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools
A Comprehensive Guide to Data Preprocessing

3. Feature engineering

  • Combining all the text features into a single text feature to make up for the missing values.
  • Mapping data to weekdays.
  • Mapping Time to hours.
  • Creating another feature based on the length of the conversation text.

Explore tools

The Best Feature Engineering Tools

4. Final train & test datasets

It can be seen that tf-idf vectorization, as well as feature engineering, leads to an increase in number of attributes, in both train & test datasets.

Model interpretation_dataset

5. Training the Classifier

Although a range of models can be used for this task, we’re going with Random Forest Classifier, which isn’t easily interpretable because it’s complex. We want to use a number of tools and frameworks to make it interpretable.

From sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(),y)

6. Getting predictions on the test dataset

testpred = rfc.predict(test_X)
Model interpretation_dataset

7. Model performance evaluation

As we don’t have the correct labels from the test dataset, let’s see how our model performed via the training classification report, and K-fold cross-validation scores.

Training Classification Report:

Model interpretation_training classification

K-fold Cross-Validation scores:

Model interpretation_scores

Learn more

The Ultimate Guide to Evaluation and Selection of Models in Machine Learning

Model Interpretation tools

Now that we built a model, it’s time to get busy with interpretation tools that can explain the predictions of our model. We’ll start with one of the most popular tools for this, ELI5.

1. ELI5

ELI5 is an acronym for ‘Explain Like I’m 5’. It’s a Python library that’s popular because it’s easy to use. It’s mainly used:

  • To learn about important features that played a crucial role in predictions.
  • To analyze a certain individual prediction, and see what exactly led the model to that prediction. 

ELI5 interprets models more in a Local/Global scope way, rather than a Specific/Agnostic way, which we discussed above. 

Also, currently ELI5 can be used only for a few of the prominent models – Sklearn Generalized Linear Models (GLMs) and tree-based models, Keras models, LightGBM, XGBoost, CatBoost.

Read also

Check how you can keep track of your Sklearn, Keras, LightGBM, and XGBoost model training metadata.

How to install Eli5

Eli5 can be installed with a pip command.

pip install eli5

Or, it can be installed with a conda command.

conda install -c conda-forge eli5

Using Eli5 on our model

Let’s import the required dependencies first

Import eli5

A: Feature Weights and Importance

Let’s start with a standard function call.


This is what comes out of the other end.

Model interpretation_ELI5

‘Weight’ column contains the weight of the associated features, which are present in the         ‘Feature’ column. Now you might think, how do you make sense of anything if features are numbers? These insights are hardly interpretable. To resolve this, we need to make some changes to our original function call.

columns = ['Source','weekday','hour','text_num_words']
feature_names = list(vect.get_feature_names()) + columns
eli5.show_weights(rfc, feature_names = feature_names)

Here, we’re getting feature names using vect, which is the Tf-Idf vectorizer, and variable columns, which contain engineered features. Feature names are then passed as **kwargs into the same function. This is what we get now.

Model interpretation_ELI5

Since we had 5000 Tf-Idf features, it makes sense to see a lot of words being given high importance. Along with the words, there’s also a feature ‘text_num_words’, obtained after feature engineering. It has been ranked as the 12th most important feature overall. This can be an iterative process, you can see the vitality of engineered attributes and, if needed, re-engineer them again.

We can also:

  • Specify whether we want all the features or just Top ‘n’ by using the ‘top’ argument. 
  • Use features_re and features_filter arguments to get only those features that fit our conditions and constraints.

Now let’s check another use case of Eli5.

B: Individual Prediction Analysis

Analyzing certain predictions in isolation can be pretty valuable. You can explain the mathematical machinery behind an obtained prediction to all the stakeholders in layman’s terms.

Let’s check a case of True Positive first.

feature_names=feature_names, top=20, show_feature_values=True)

Here’s what’s causing the model to predict that this data point belongs to a patient.

Model interpretation_ELI5

So, the model concluded that this data point belongs to a patient, because of features like ‘enlarged heart’, ‘hospital heart’, ‘dizzy’, ‘cancer’ etc.. Makes sense. ‘<BIAS>’, which is the expected average score output by the model based on the distribution of the training set, plays the most crucial role.

Let’s take a look at the case of a True Negative now.

Model interpretation_ELI5

As compared to the first image, we can clearly see that words associated with

health/medical conditions are far too few. Even though a couple of features like ‘diagnosed’, ‘congestive heart’ make it to the top contributors, their associated contribution and value is zero.

Let’s move on to another exciting tool.


LIME stands for Local Interpretable Model-agnostic Explanations. Let’s dwell a little deeper into the name:

  • The term Local means an analysis of individual predictions. LIME gives us extensive insights into what goes on behind a certain prediction.
  • It’s model-agnostic, which means it treats every model as a black box, so it can make interpretations without accessing model internals, allowing it to work with a wide range of models.

As intuitive as the name is, the idea behind its interpretation logic is just as intuitive:

  • LIME basically tests what happens to the predictions once the model is provided with certain variations in the input data.
  • To test this, LIME trains an interpretable model on a new dataset consisting of perturbed samples and the corresponding predictions of the black box model.
  • This new learned model needs to be a good local approximation (for a certain individual prediction), but doesn’t have to be a good global approximation. It can be expressed mathematically as:

The explanation model, for example x, is the interpretable model g created by LIME which minimizes the loss function L, which measures how close the explanation is to the prediction of original model f, while the model complexity Ω(g) is kept low (fewer features). G is the family of possible explanations, i.e. all GLMs.

How to install LIME?

Like ELI5, it can also be installed with a simple pip command.

pip install lime

Or with a conda command.

conda install -c conda-forge lime

Using Lime on our model

LIME offers three methods for interpretation mainly and all three deal with different kind of data:

  • Tabular Interpretation,
  • Text interpretation,
  • Image interpretation.

Out of our 5004 trainable attributes, 5000 are Tf-Idf based features which in turn are nothing but words. S, we’ll go with Lime’s text interpretation method. For that, we’ll have to make some changes to our training.,y)

As you can see, we’re now only using vectorized text features for modelling.

As we know, LIME prepares a new dataset on which it trains its own interpretable model. But how does LIME do that in case of textual data? This is how:

  • New texts are created by toggling the presence/absence of randomly chosen words, present in original text.
  • A feature is 1 if the corresponding word is included and 0 if removed, thus making it a binary representation.

Enough with the theoretical part, let’s see LIME in action.

Importing the required dependencies:

Import lime
From sklearn.pipeline import make_pipeline

Making the required function calls:

explainer = lime.lime_text.LimeTextExplainer(
class_names=[‘Not Patient’, ‘Patient’])
pl = make_pipeline(vect,rfc)

First we make an instance of Text Explainer. Then, since we’re dealing with textual features, we combine our vectorizer(vect) with our model(rfc) using sklearn’s pipeline method, so that the text feature – which is going to get input – can get vectorized, and the prediction can happen. 

Just like ELI5, let’s check for a True Positive instance first.

exp = explainer.explain_instance(
train[‘combined_text’][689], pl.predict_proba)

Let’s plot instance results and see what we get.

Model interpretation_LIME

On reading the text, it becomes quite clear that it’s talking about some patient whose life was saved after a much-needed heart transplant, simultaneously appreciating the medical staff. We can see that the highlighted words are present in the ‘Patient’ column, thus responsible for correctly classifying this data point as ‘Patient’.

Now let’s see what the interpretation plot for a True Negative instance looks like.

Model interpretation_LIME

On the first look with highlighted keywords like ‘cold’, ’fat’, ‘burn’ etc., this data point comes across as associated with the ‘Patient’ class. On actually reading it, we understand that the text is talking about the benefits of taking a cold shower and evidently, our model understands this too.

So far so good. But what if our features are continuous values in a tabular format, or pixel values of an image?

For that, we just need to remember what we discussed above:

Data type Function
Text lime.lime_text.LimeTextExplainer()
Tabular lime.lime_tabular.limeTabularExplainer()
Image lime.lime_image.limeImageExplainer()

Okay, moving onto the next model interpretation tool – SHAP.


SHapley Additive exPlanations is a game-theoretic approach to explain the output of any machine learning model. SHAP explains the prediction of an instance by computing the contribution of each feature to that prediction. It uses Shapley values.

What are Shapley values?

  • Shapley values – a method from coalitional game theory – tell us how to distribute the “payout” among the features.
  • Thus a prediction can be explained by assuming that each feature value of the instance is a “player” in the game where prediction is the payout.

How are Shapely values calculated?

  • The Shapely value is the average marginal contribution of a feature value across all possible coalitions.
  • Let’s say we have a dataset of shape N x M, where N is the number of samples and M is the number of features, for example 5 – A, B, C, D & E.
  • E is the dependent attribute having continuous values, while A, B, C, D are our categorically valued predictors.
  • Now, say we want to calculate the contribution of feature A, i.e. calculate its Shapely value.
  • We simulate only A, B and C are in a coalition by randomly picking an instance from the dataset and using it’s value of feature D. We then predict E for this combination, say it comes out to be X.
  • Now we replace the value of feature A in this combination with a randomly drawn value from A’s domain (provided it’s different) and predict E again, say it comes out as Y this time.
  • The difference between X-Y, whether positive or negative, is the contribution of feature A in prediction.
  • This sampling step for A’s value is repeated again and again, and contributions are averaged to obtain the Shapely value for A.

The SHAP explanation can be expressed mathematically as:

where g is the explanation model, z′ϵ{0,1}M, is the coalition vector, M is the maximum coalition size and j ϵ R is the Shapely value for a feature j.

Enough theory, let’s see how SHAP performs on our model.

How to install SHAP

Just like other libraries, it can be installed with a pip command, just make sure that the pip version is over 19.0.

pip install shape

If you run into any error with pip, you can always use conda for installation.

conda install -c conda-forge shap

Using SHAP on our model

Importing the required dependencies:

Import shap

You can use different explainers that are available in SHAP depending on your model. Since we’re dealing with a Random Forest Classifier, we’ll be using SHAP’s tree explainer.

explainer = shap.TreeExplainer(rfc)

Let’s calculate shap values for our features. Remember, since most of our features are text-based, we’ll be leveraging them to make sense of our model just as we did with LIME.

shap_values =

shap_values comes out as a list containing 2 arrays as elements, corresponding to the 2 classes we have in our dataset. So, we can interpret the prediction from the perspective of both classes.

As per our method, let’s begin with the interpretation of a True Positive instance first. To exhibit some consistency, we’ll be checking the same data points as we checked with LIME.

shap.force_plot(explainer.expected_value[1], shap_values = shap_values[1][689], features = vectorized_train_text.toarray()[0:][689], feature_names = vect.get_feature_names())

The plot looks like this:

Model interpretation_SHAP

Now the indexing in code, as well as the plot itself, can seem a little overwhelming. Let’s break it down step by step.

Values associated with the ‘Patient’ class are present at the 1st index of expected_value and shap_values, hence this plot is from the perspective of the ‘Patient’ class.

About the plot:

  • All feature values lead to a prediction score of 0.74, which is shown in bold.
  • Base value = 0.206 is the average of all output values of the model on training.
  • Feature values present in pink (red) influence the prediction towards class 1 (Patient), while those in blue drag the outcome towards class 0 (Not Patient).
  • The size of the colored block represents feature importance in magnitude.
  • Since our prediction score (0.74) > base value (0.206), this data point has been positively classified, i.e. class = Patient.

What will happen if we view this instance from the perspective of the other class?

shap.force_plot(explainer.expected_value[1], shap_values = shap_values[1][689], features = vectorized_train_text.toarray()[0:][689], feature_names = vect.get_feature_names())
Model interpretation_SHAP

We can easily make sense of what’s happening here:

  • We switched the indices of expected_value and shap_values from 1 to 0, because we wanted the perspective reversed from ‘Patient’ to ‘Not Patient’.
  • Consequently, all the features which were present in pink (red) have switched sides to blue and are influencing the prediction negatively now with respect to class ‘Not Patient’.
  • Although prediction score (0.27) < base value (0.794), we’re viewing it from the opposite perspective, so this datapoint belongs to class ‘Patient’.

Let’s check the True Negative instance now.

shap.force_plot(explainer.expected_value[1], shap_values = shap_values[1][120], features = vectorized_train_text.toarray()[0:][120], feature_names = vect.get_feature_names())

The plot comes out to look like this.

Model interpretation_SHAP

Since the prediction score comes out to be less than the base value, it’s classified as ‘Not Patient’.

Along with local interpretations, SHAP can also explain the general behavior of the model via global interpretation.

shap.summary_plot(shap_values = shap_values[1], features = vectorized_train_text.toarray(), feature_names = vect.get_feature_names())
Model interpretation_SHAP

As evident from the code, we’re plotting from the perspective of ‘Patient’ class, and we can see features like ‘congestive heart’, ‘congestive heart failure’ and ‘trouble’ extending towards the red spectrum, thus playing an imperative role for ‘Patient’ class.

SHAP can be a little overwhelming at first with the range of features it provides, but once you get a hang of it, there’s nothing as intuitive as this.

We’ll check out one more interpretation library, MLXTEND.


MLxtend or Machine Learning Extensions is a library of useful tools for day-to-day data science & machine learning tasks. It offers a wide range of functions to work with.

So far, we’ve only analyzed text features and other features remained on the sidelines. This time, let’s take a look at the remaining features also, using MLXTEND.

How to install MLXTEND

It can be installed with a simple pip command.

pip install mlxtend

Alternatively, you can download the package manually from, unzip it, navigate into the package, and use the command:

python install

Using MLXTEND on our model

Importing the required dependencies:

Import mlxtend

MLXTEND offers different features, like:

1. PCA correlation circle

  • An interesting way of looking at results can be via Principal Component Analysis.
  • MLXTEND lets you plot a PCA correlation circle using the plot_pca_correlation_graph function.
  • We basically compute the correlation between our features and the Principal Components.
  • Then these correlations are plotted as vectors on a unit circle, whose axes are the Principal Components.
  • Specific Principal Components can be passed as a tuple to dimensions function argument.
  • The correlation circle axes show the percentage of variance explained for the corresponding Principal Components.

Let’s plot this correlation circle for our remaining features and see what we get.

from mlxtend.plotting import plot_pca_correlation_graph
from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit_transform(train[['text_num_words', 'weekday','hour','Source']].values)
fig, corr_matrix = plot_pca_correlation_graph(
    ['text_num_words', 'weekday','hour','Source'],
    dimensions=(1, 2),
Model interpretation_MLXTEND
  • 1st Principal Component explains 31.7% of the total variance, while 2nd Principal Component explains 25.8%
  • Text_num_words & Source are more aligned with 1st PC while hour and weekday are more aligned with 2nd PC.

2. Bias-Variance decomposition

  • Everyone knows the bias-variance tradeoff, which plagues all machine learning projects.
  • Often the goal is to find a sweet spot between the two, to avoid underfitting by keeping a low bias, and avoid overfitting by keeping a low variance.
  • It’s hard to get a bias-variance score for any predictive model, but MLXTEND can decompose the generalization error of a model into bias, variance, and error score.

Let’s try to calculate this score for our Random Forest Classifier.

from mlxtend.evaluate import bias_variance_decomp
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(vectorized_train_text, y,test_size=0.25,

It’s imperative here to split our dataset into train & test for this calculation to take place.

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(
        clf2, X_train, y_train, X_test, y_test,
print(f"Average expected loss: {avg_expected_loss.round(3)}")
print(f"Average bias: {avg_bias.round(3)}")
print(f"Average variance: {avg_var.round(3)}")

This is what the score comes out to be.

  • We trained the model on our text features for this exercise.
  • As per the scores we’re getting, we can infer that our model might be good at generalization, i.e. it’s not overfitting.
  • Since it has a relatively high bias, it could mean that it’s underfitting to some extent on our dataset.

3. Plotting decision boundaries & regions of the model

  • With MLXTEND, we can also take a look at the model’s decision boundary in 2 dimensions and see how the model is differentiating among data points of different classes.
  • However, there’s a drawback associated with this interpretation technique. Only 2 features can be used at a time for this visualization, therefore we will only be using our non-text features here, in groups of two.

Let’s see what kind of decision boundary we get here.

Making the required imports:

from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import EnsembleVoteClassifier
import matplotlib.gridspec as gridspec
import itertools
import matplotlib.pyplot as plt

Instantiating the model:

Clf = RandomForestClassifier(random_state=1)
gs = gridspec.GridSpec(1,2)
fig = plt.figure(figsize=(15,8))
labels = ['Random Forest']

Now, plotting:

for clf, lab, grd in zip([clf2], labels, itertools.product([0],[0])):[['Source', 'weekday']].values, y)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(X=train[['Source', 'weekday']].values,
y=y, clf=clf)

This is what we get:

Model interpretation_MLXTEND

We can clearly see that this is a very bad decision boundary. The input features are not good differentiators, thus cementing the results we obtained with other tools.

Wrapping up

With increasingly complex architectures, model interpretation is something you simply have to do nowadays. I hope that now you have some idea of how to do it.

The tools we have explored in this article are not the only available tools, and there are many ways to make sense of model predictions. Some of these ways might include tools or frameworks, while others might not, and I encourage you to explore them all.

Future directions

If you liked what you read and want to dig deeper into this topic, you can check out the Model Explanation section under this link. It contains information about many of the approaches you might want in your arsenal for interpreting your model.

That’s it for now. Stay tuned for more!

Was the article useful?

Thank you for your feedback!