Intuition on Mutual information in XAI

79 Views Asked by At

Following wikipedia, mutual information $MI(X,Y)$ is 'how much knowledge is in the output of one random variable about the other variable'.

If MI between 2 variables is high, this does not mean I know which value is taken most probably by Y, if I know the value of X; it just means, that given the value of X, the value of Y is pre-determined, am I correct?

Then I have a question to the useage of Mutual Information on Explainable AI (XAI): Usually, MI is used to identify important input to the ML-Model for obtaining a particular label. If we use MI, we find important input, on which the ML-Model is confident, regardless the label the ML-Model is confident on by maximizing mutual information.

I ask myself, why is important input measured then by MI? Could someone provide me an intuitive explanation why it still makes sense to use MI in certain cases?

One paper which uses MI to obtain explanations: GNNExplainer: https://arxiv.org/pdf/1903.03894.pdf

Thanks for your help!

1

There are 1 best solutions below

1
On BEST ANSWER

Mutual information $MI(X,Y)$ is a measure of the quantity of information gained about one random variable $X$ by observing another random variable $Y$. To be precise, it is not quantifying the maximum of information gained by $Y$ given $X$, but rather measures how much information $X$ provides about $Y$.

MI is applied in Explainable AI (XAI) because it helps to measure the dependency between the inputs $X$ and outputs $Y$ of a model. If $MI(X,Y)$ is high or maximum, it means that a minimum amount of information of $X$ gives us a maximum of information about $Y$. This means that $Y$ is highly predictable by $X$. Predictable, however, does not mean being determined.

For instance, let $F(\mathbf{X})$ an image classification model, and we want to investigate which ensemble $\{x_i\}$ of the input image $\mathbf{X}$ are encoding the most information $I_{max}$ of the model classification decision. We can measure the mutual information between each pixel or ensembles of pixels $\{x_i\}$ in the image and the output label $\mathbf{Y}$. If a certain input pixel ensemble provides high mutual information with the output label, conclude that the information encapsulated in the stochastic value of that pixel provides a high quantity of information about the overall output label. In other words, that pixel ensemble is highly informative and significant for the quality of the model's decision.

In explainable AI, one goal is to develop measures to quantitatively comprehend the reasoning of the AI's decision. MI is a tool, for assisting with this.