KL divergence theorem

65 Views Asked by At

Given two distributions $p(x)$ and $q(x)$, I can find the similarity measure between them using $KL(p||q) = -\sum_xp(x) log \frac{q(x)}{p(x)}$. Now I'm defining mutual info as $I(X,Y) \equiv KL(p(x,y) || p(x)p(y))$. I am defining $H$ as the information entropy.

I want to show that $I(X,Y) = H(X) - H(X|Y)$ and $I(X,Y) = H(Y) - H(Y|X)$. However, I'm having a hard time wrapping my mind around how the information entropy plays a role in the mutual information from a conceptual standpoint.

Not sure how I can start to prove this mathematically either using the equation I have. How do I start?

1

There are 1 best solutions below

2
On BEST ANSWER

Here is a qualitative way to interpret that identity: $KL(p||q)$ is a measure of how different the distributions $p$ and $q$ are. So the KL divergence between the true joint distribution of $(X,Y)$ and the product of their marginals is a measure of how far away $X$ and $Y$ are from being independent (if $X$ and $Y$ are independent then their joint distribution is the product measure). We also know that in terms of entropy, $H(X|Y) \leq H(X)$ and equality holds iff $X$ and $Y$ are independent. Therefore $H(X) - H(X|Y)$ is also a measure of how far away $X$ and $Y$ are from being independent. So it makes sense that these two quantities should be related to each other, but it's not clear just from this qualitative analysis why they should be exactly equal.