Is there an intuitive interpretation of $H(X,Y) \leq H(X) + H(Y)$?

54 Views Asked by At

$H(X,Y) = −\sum_{i,j}p(x_i,y_j)\log p(x_i,y_j)$

When $H$ is interpreted as "surprise" as in Ross's book, is there an intuitive explanation of the formula?

Thanks in advance!

1

There are 1 best solutions below

0
On

You can write $$H(X)+H(Y)-H(X,Y)=H(X)-H(X|Y)=I(X;Y)$$ This terms, as you might probably know, is the mutual information between $X$ and $Y$. This term can be interpreted as "How much these two random variables could say about each other (or have in common in a sense)"?. You can say also "How far is the joint probability of these two random variables form independent random variables (a measure called relative entropy"? This term is very fundamental, as it is the maximum achievable capacity of a channel described by $p(y|x)$. You could have set interpretations as well. If $X$ and $Y$ are correlated (and not independent), then there exists some "joint events" that result in both. Hence $H(X|Y)$ is less than $H(X)$.