Numerically robust computation of the mutual information

98 Views Asked by At

Given the numerical distributions $p(x,y), p(x|y), p(y|x)$, what is the most numerically robust way of computing $I(X;Y)$? Should one use the formula for $I(X;Y)$ directly? Or should we use either of the identities for $I(X;Y)$ that involves the entropy functions.

I cannot find any reference on the subject in "Numerical Recipes" by Press et al. and "Elements of Information Theory" by Cover and Thomas.


A little background:

Define $p(X=x,Y=y)$ to be a joint discrete probability distribution over the same support $\mathcal{X}$. Then, the Shannon entropy of the joint distribution is

$$ H(X,Y) := -\sum_{x\in \mathcal{X}}\sum_{y \in \mathcal{X}}p(x,y)\log_2 p(x,y) $$

and the entropy of the marginals are of the form

$$ H(X) := -\sum_{x\in \mathcal{X}}p(x)\log_2 p(x) $$

Furthermore, say we can compute the conditional probability distributions. Then we have entropy functions of the form

$$ H(X|Y) := -\sum_{x\in \mathcal{X}}p(x,y)\log_2 p(x|y) $$

Now, defining the mutual information

$$I(X;Y) := -\sum_{x\in \mathcal{X}}\sum_{y \in \mathcal{X}}p(x,y)\log_2\frac{ p(x,y)}{p(x),p(y)}\;,$$

we have the usual relationships

$$ I(X;Y) = H(X) - H(X|Y)$$ $$ I(X;Y) = H(X) + H(Y) - H(X,Y)$$


The mutual information involves divisions which can be somewhat numerically unstable. The entropy functions involve terms of lower magnitude, so it could also be numerically unstable under addition. Seems like there is a trade-off here.