Can you have a multi-variable Marginal Distribution?

56 Views Asked by At

The SHAP algorithm is a commonly used method in Machine Learning to explain black-box models. I'm working on producing my own version of the SHAP algorithm to help my understanding of the method. But I'm finding it difficult to figure out how they sample a background distribution.

In particular, if you read the paper, I'm struggling with equations 9-12 - they're reproduced below.

SHAP is based on Shapley values from Game theory. Features in a model are treated as players of a game - the game is the model. A feature's contribution to the prediction can be calculated by finding out the model's result with and without the feature for every possible combination of features.

For large models this is computationally prohibitive. But it's possible to sample the combinations and approximate the values, which is the method presented in the SHAP paper.

Unfortunately, for most models, you can't arbitrarily remove features without changing the model entirely, which means you have to find another way to remove the correspondence between the feature and the model's prediction.

In the paper, they suggest that you can average across each feature's marginal distribution using the following equations:

$$\begin{align} f(h_x(z')) &= E[f(z) | z_S] \\ &= E_{z_{\bar S} | z_S}[f(z)] \\ &\approx E_{z_{\bar S}}[f(z)] \\ &\approx f([z_S, E[z_{\bar S}]])\end{align}$$

here $f(h_x(z'))$ is the model's value for the feature combination $z'$, where $z'$ is a binary vector with $0$ for features not being used and $1$ for features being used, and $h_x$ transforms the binary vector into input for the model $f$. $z_S$ is the subset of features $S$ that are included in the feature combination, and $z_{\bar S}$ is the subset of features not included in the feature combination.

What do these equations mean? It looks to me like they're saying you can get the expected value of $f(z)$ across a marginal distribution of more than one feature.

Is that right? I thought it was only possible to get a marginal distribution for a single feature.

1

There are 1 best solutions below

3
On BEST ANSWER

If I'm reading this paper right, their notation is terrible, but I'll try to clear it up. The point of these equations is to define a meaningful output $\tilde f(z_S)$ for the model $f(z)$, given only the values $z_S$ of features in the set $S$.

The first line states that they define the output $\tilde f(z_S)$ to be the conditional expectation of $f(z)$ given $z_S$.

The second line is just rephrasing the first; the conditional expectation $\mathbb E[f(z)\mid z_S]$ is an expectation over the conditional distribution of $z_{\bar S} \mid z_S$.

The third line says that you can approximate the conditional distribution of $z_{\bar S} \mid z_S$ by the marginal distribution of $z_{\bar S}$. (Yes, marginal distributions can involve more than one variable.)

The fourth line says you can further approximate $f$ as linear. You can then pull the expectation inside $f$, obtaining the final approximation

$$\tilde f(z_S) \approx f([z_S, \mathbb E[z_{\bar S}]]);$$

that is to say, you replace the missing features in $z$ with the mean of their marginal distributions.