UP | HOME

Explaining ML Models with Shapley Values

Overview

Recently it was in the news that MIT'S NANDA (Networked AI Agents in Decentralized Architecture) initiative published a report finding that only five percent (5%) of AI pilot programs generate significant revenue, with the majority failing to have any meaningful impact. Buried within that report are two factors that lead to success with current technology: working with an expert vendor and automating back-office processes.

The Zybe Approach

One positive that can be drawn from the recent report is that there are now general guidelines companies can follow make success a more likely outcome. Implementing AI/ML solutions can be done using a formulaic approach, and it doesn't require incurring a significant cost footprint. Zybe group has had success partnering with customers to deliver effective AI/ML solutions by focusing on a few particular outcomes:

  • Identifying high-impact low-risk back-office operations suitable for AI/ML automation.
  • Starting from a preconfigured Infrastructure as Code (IaC) cloud with a framework informed by best practices.
  • Partnering with in-house teams to streamline adoption and build out capabilities.
  • Establishing clear metrics for quantifying and measuring progress towards goals.

Despite the imaginative media landscape around artificial intelligence, We've found that the underlying mechanics of ML models can be explained simply and in a completely unambiguous way. By embedding with teams to fill out knowledge gaps, building out an in-house AI/ML practice can be achieved simply by laying out practical procedures, metrics, and guidelines.

About Explainability

Explainabiltiy is the field of understanding and interpreting why a machine learning model makes a particular decision. For many AI/ML applications, a crucial factor to success is gaining control of the internals of a model from a mechanistic standpoint. This could be to enforce a sense of fairness or enforce some other concept on the model. Part of this process is understanding how a model arrived at a particular set of outcomes. Cooperative game theory provides a solution concept called Shapley values that is useful for gaining that understanding.

Game Theory

As stated in Wikipedia, Game theory is:

the study of mathematical models of strategic interactions.

In other words, its the study of the interplay between two or more rational parties. It's found successful application in a wide variety of fields from economics to warfare and now to explaining ML models.

The Shapley Value

In a game where players cooperate, the Shapley Value is a formal rule for distributing gains and losses, or attributing credit and blame, to collaborating players. In the context of the model examples below, the prediction of a model is the game and the features included in the model are the players

SHAP (SHapley Adaptive exPlanations)

SHAP is a Python library for explaining the output of machine learning models. It provides sample datasets and can integrate with matplotlib to provide visualized explanations.

Simple Sentiment Analysis Example

First, ensure any needed dependencies are present in the jupyter environment.

%pip install torch tensorflow tf_keras transformers matplotlib shap numpy scipy

In this example, a BERT (Bidirectional Encoder Representations for Transformers) model, which is ideal for text classification is used. A random review from the IMDB dataset provided with the SHAP library is used as a sample input.

import shap
import numpy as np
import scipy as sp
from torch import tensor
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

model_id = "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_id)
model = DistilBertForSequenceClassification.from_pretrained(model_id).cuda()

# BERT uses word-piece tokenization which can create additional tokens so
# additional parameters are specified to ensure fixed length tokenization
def f(x):
    tv = tensor([tokenizer.encode(v, padding="max_length", max_length=500, truncation=True) for v in x]).cuda()
    outputs = model(tv)[0].detach().cpu().numpy()
    scores = (np.exp(outputs).T / np.exp(outputs).sum(-1)).T
    val = sp.special.logit(scores[:, 1])
    return val

explainer = shap.Explainer(f, tokenizer)

imdb_train = shap.datasets.imdb()[0]
shap_values = explainer(imdb_train[:10], fixed_context=1)

shap.plots.text(shap_values[2], display=True)
shap.plots.waterfall(shap_values[2])

Using SHAP's text plot we're able to see how tokens overlay on top of the text along with the importance of those tokens. Red regions increase the output of the model while blue regions decrease the output giving the overall sentiment.

-1.3-1.7-2.1-2.5-0.9-0.5-0.1-1.35925-1.35925base value-1.23547-1.23547f(inputs)0.146 never 0.071 ppo 0.061 lack 0.054 t 0.043 than 0.04 I 0.04 in 0.039 director 0.032 . 0.032 youth 0.029 enam 0.029 . 0.028 finger 0.028 just 0.028 . 0.026 screenplay 0.023 merely 0.023 But 0.022 or 0.022 movie 0.021 chemistry 0.021 Just 0.02 t 0.02 lack 0.019 cula 0.017 something 0.017 inted 0.015 > 0.015 But 0.014 an 0.014 Oscar 0.013 exactly 0.012 shared 0.012 don 0.011 . 0.011 skills 0.01 was 0.01 play 0.01 Even 0.01 with 0.009 restaurant 0.009 . 0.008 could 0.008 the 0.007 more 0.007 < 0.007 , 0.007 convinced 0.007 . 0.007 first 0.007 char 0.006 the 0.006 I 0.006 have 0.006 < 0.006 romantic 0.006 self 0.006 ry 0.006 the 0.005 at 0.005 and 0.005 of 0.005 body 0.005 / 0.004 the 0.004 it 0.004 who 0.004 needed 0.003 < 0.003 the 0.003 in 0.003 tab 0.003 chef 0.003 / 0.003 himself 0.003 0.003 for 0.003 she 0.002 love 0.002 < 0.002 anything 0.002 don 0.002 the 0.002 in 0.002 the 0.002 actress 0.002 / 0.001 on 0.001 the 0.001 > 0.001 been 0.001 any 0.001 ' 0.001 being 0.0 / 0.0 cu 0.0 screen 0.0 the 0.0 This 0.0 could -0.174 disa -0.057 love -0.054 mis -0.049 cal -0.035 ed -0.033 of -0.026 ored -0.025 plo -0.024 ful -0.023 know -0.022 t -0.022 ? -0.021 couldn -0.021 It -0.021 . -0.02 ' -0.019 nominated -0.019 : -0.019 evi -0.018 get -0.018 this -0.017 else -0.016 your -0.016 isma -0.014 for -0.013 , -0.013 me -0.012 ly -0.012 ' -0.012 of -0.011 at -0.011 very -0.011 He -0.01 , -0.01 ? -0.01 put -0.01 was -0.01 when -0.01 what -0.009 on -0.009 actors -0.009 -0.009 well -0.009 to -0.009 who -0.009 > -0.009 I -0.009 was -0.008 judge -0.008 leading -0.008 seemed -0.007 ex -0.007 film -0.007 s -0.007 ted -0.007 in -0.006 it -0.006 of -0.006 he -0.006 have -0.006 . -0.005 its -0.005 br -0.005 of -0.005 for -0.005 He -0.005 leading -0.004 across -0.004 man -0.004 been -0.004 with -0.004 my -0.004 br -0.004 her -0.003 and -0.003 as -0.003 This -0.003 > -0.003 translated -0.003 came -0.003 with -0.002 was -0.002 br -0.002 princes -0.002 his -0.002 so -0.002 his -0.002 ultimately -0.001 . -0.001 lina -0.001 the -0.001 and -0.001 he -0.001 part -0.001 , -0.0 scenes -0.0 br -0.0 from -0.0 actors
inputs
0.003
0.0
This
-0.007
film
0.061
lack
-0.035
ed
0.017
something
-0.009
I
-0.021
couldn
0.001
'
0.054
t
-0.01
put
-0.004
my
0.028
finger
-0.009
on
0.005
at
0.007
first
-0.019
:
0.007
char
-0.016
isma
0.001
on
0.003
the
-0.001
part
-0.005
of
-0.001
the
-0.008
leading
0.002
actress
0.009
.
-0.003
This
0.002
in
-0.019
evi
0.003
tab
-0.012
ly
-0.003
translated
-0.009
to
0.02
lack
-0.006
of
0.021
chemistry
-0.01
when
0.003
she
0.012
shared
0.006
the
0.0
screen
-0.004
with
-0.004
her
-0.005
leading
-0.004
man
0.029
.
0.01
Even
0.008
the
0.006
romantic
-0.0
scenes
-0.003
came
-0.004
across
-0.003
as
0.001
being
0.023
merely
0.004
the
-0.009
actors
-0.011
at
0.01
play
-0.006
.
-0.021
It
0.008
could
-0.011
very
-0.009
well
0.006
have
-0.004
been
0.0
the
0.039
director
-0.009
who
-0.054
mis
-0.049
cal
0.019
cula
-0.007
ted
-0.01
what
-0.001
he
0.004
needed
-0.0
from
0.002
the
-0.0
actors
0.032
.
0.006
I
0.028
just
0.012
don
-0.02
'
-0.022
t
-0.023
know
0.007
.
0.007
<
-0.005
br
0.0
/
-0.003
>
0.002
<
-0.0
br
0.005
/
0.015
>
0.015
But
0.0
could
0.004
it
-0.006
have
0.001
been
0.002
the
0.026
screenplay
-0.022
?
0.021
Just
0.013
exactly
0.004
who
-0.002
was
0.001
the
0.003
chef
0.04
in
-0.057
love
0.01
with
-0.01
?
-0.005
He
-0.008
seemed
0.007
more
0.029
enam
-0.026
ored
0.005
of
-0.002
his
0.0
cu
-0.001
lina
0.006
ry
0.011
skills
0.005
and
0.009
restaurant
-0.001
,
-0.001
and
-0.002
ultimately
-0.012
of
0.003
himself
-0.003
and
-0.002
his
0.032
youth
-0.024
ful
-0.007
ex
-0.025
plo
-0.005
its
0.007
,
0.043
than
-0.033
of
0.001
any
0.005
body
0.022
or
0.002
anything
-0.017
else
0.028
.
-0.011
He
0.146
never
0.007
convinced
-0.013
me
-0.006
he
-0.009
was
0.003
in
0.002
love
-0.003
with
0.006
the
-0.002
princes
-0.007
s
0.011
.
0.003
<
-0.002
br
0.003
/
0.001
>
0.006
<
-0.004
br
0.002
/
-0.009
>
0.04
I
0.01
was
-0.174
disa
0.071
ppo
0.017
inted
-0.007
in
-0.018
this
0.022
movie
-0.001
.
0.023
But
-0.013
,
0.002
don
-0.012
'
0.02
t
-0.005
for
-0.018
get
-0.006
it
-0.01
was
-0.019
nominated
0.003
for
0.014
an
0.014
Oscar
-0.01
,
-0.002
so
-0.008
judge
-0.014
for
-0.016
your
0.006
self
-0.021
.
-0.009

A Waterfall plot also helps to visualize what tokens had the most influence on the model. In this sample we can see how the BERT model broke apart the word "dissapointed" and the tokens comprising this word had an outsized impact on the output. ddb2a6353b52e27ebfd4ce5b7b251f661366c02e.png

Selection of next token example

Another simple way to glance at the internals of a model in a superficial way is to have the model complete a sentence, and then look at the other possible outcomes the model considered. In this example we're asking GPT-2 to complete a common english idiom.

import shap
from transformers import AutoModelForCausalLM, AutoTokenizer

text = "costs an arm and a"

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
shap_model = shap.models.TopKLM(model, tokenizer, k=5)
masker = shap.maskers.Text(tokenizer)
explainer = shap.Explainer(shap_model, masker)
shap_values = explainer([text])

shap.plots.text(shap_values, display=True)


[0]
outputs
leg
hand
half
foot
shoulder


-5-8-11-14-21-11.7299-11.7299base value3.032683.03268f leg(inputs)6.619 a 6.41 arm 1.876 and 0.333 an 0.329 cost -0.803 s
inputs
0.329
cost
-0.803
s
0.333
an
6.41
arm
1.876
and
6.619
a

Here we can see that "arm" was attributed much more importance than the other words in the input sentence. Given this we can begin to understand why all of the other potential responses are body parts involved with limbs.

In Practice

In order to be useful, an explanation needs to be contrasted with some kind of baseline. In the context of machine learning models, a baseline is essentially what the model would predict if it had no information about the input features. A common baseline is the average prediction across the dataset. Imagining a hypothetical model that predicts if an image is a cat, the baseline would be how often the model predicts "cat" in general. By employing SHAP the contribution of each feature is calculated by looking at the change from the baseline as features are added.

After establishing a SHAP baseline, pre and post-training model bias metrics can be collected to inform what actions should be taken to adjust the behavior of the model. This could mean removing or refactoring features, or performing some procedure upon training data.

How does this help?

On a scale of complexity, models have a range of effectiveness. If they're too simplistic they underperform in general and if they're too complex they overperform until given new data not included during training. By establishing a track record of explanations and analyzing how they differ from the baseline underfitting and overfitting, as well as other types of bias, can be avoided (or enhanced). Other outliers could indicate problems with the training data or the implementation of a feature.

Further Reading

  • Amazon's SageMaker Clarify provides a comprehensive framework for evaluating models and explaining model predictions, allowing developers to spend more time analyzing and adjusting their models, rather than fighting with Jupyter environments and pipelines.
  • The SageMaker Developer Guide contains a comprehensive overview of how explanability and model analysis is implemented within the AWS ecosystem.
  • A blog post about Compliance and Generative AI written by Ryan, one of our engineers.

Want to Talk?

I'm always working to better understand the needs and challenges faced by industry leaders. If you'd like to have a conversation about what you've been seeing in your space, or have questions about how Zybe approaches machine learning and artificial intelligence in general, please reach out to me on LinkedIn or my email below.

Author: patrick@zybe.group

Created: 2025-10-14 Tue 14:10