How do I prove that additive joint entropy implies random variables are independent?

591 Views Asked by At

In Information Theory, Inference, and Learning Algorithms by David J.C. MacKay page 33, the author says that:

Entropy is additive for independent random variables:

$$H(X,Y) = H(X)+ H(Y) \text{ iff } P(x,y) =P(x)P(y)$$

I think showing that the right side implies the left is straightforward, but I have no idea how to show the left implies the right. Specifically, I don't know how to prove that

$$\sum_{x,y\in A_xA_y}p(x,y)\log\frac{1}{p(x,y)} = \sum_{x\in A_x}p(x)\log\frac{1}{p(x)} + \sum_{y\in A_y}p(y)\log\frac{1}{p(y)}$$

implies $$P(x,y) =P(x)P(y)$$

Could anyone provide an proof or an intuition on why this is true?

1

There are 1 best solutions below

0
On

This is equivalent to the property:

The mutual information $I(X;Y)=H(X)+H(Y)-H(X;Y)=H(X)-H(X|Y)$ is zero if and only if $X,Y$ are independent.

This (the "only if" part) is not trivial or intuitive. The standard proof amounts to proving that the Kullbak-Leibler divergence (or distance, or "relative entropy") between two distributions is zero if and only if the distributions are identical, positive otherwise. Then, noticing that the mutual information can be expressed as the KL divergence between the joint distribution and the product of the marginals, the result follows.

To prove the property about KL divergence (again, far from obvious), one can resort to the log-sum inequality.

For details, see Cover and Thomas' textbook