How to use joint entropy properly

91 Views Asked by At

I'm currently reading Elements of Information Theory and I'm a little confused when it comes to joint entropy. The book provides two separate definitions for it:

$$H(X,Y) = -\sum_{x ∈ X} \sum_{y∈Y} \,p(x,y)\,log\,p(x,y)$$

where p(x,y) is the probability mass function for x and y.

The other version is:

$$H(X,Y) = H(X) + H(Y|X)$$

Further, the conditional entropy is defined as:

$$H(Y|X) = \sum_{x ∈ X} p(x)H(Y|X=x)$$

This seemed fine at first, but when I try to apply these methods to the same problem I get different results. Example:

Suppose a fair die is thrown. Let X denote the number facing upwards after a throw. Further let Y denote if X is even or odd. Calculate H(XY).

Now unless I'm completely off, $p(x,y)$ would in this case would be $\frac 1 6$ because the probability of getting odd or even depends on the number that is facing upwards. Therefore when I try the fist method I get:

$$H(XY) =-(\frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6) = log6$$

Now suppose I want to try to verify this using the second formula. Incidentally, $H(X)$ is exactly equal to $H(XY)$ and $H(Y|X)$ is:

$$H(Y|X) = \\-(\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2) +\\ \frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)) = 1$$

Which means that $H(XY) = log6 + 1$. Is this even possible? Am I doing something wrong?

Any help or insight is very much appreciated.

1

There are 1 best solutions below

2
On BEST ANSWER

For any fixed $x$, the value of $H(Y \mid X = x)$ should equal 0; this is what formalizes your intuition that there is no further information in $Y$ once $X$ is known, since $Y$ is fully determined by $X$. In other words, $H(X, Y) = H(X)$ (the information in both $X, Y$ is the same as the information in $X$).

The problem in your calculation is that you compute $H(Y \mid X = x)$ as if you are computing $H(Y)$; in fact, $H(Y \mid X = x) = -1\log(1) - 0\log(0) = 0$ (with the standard convention that $0\log(0)$ is defined and equals 0).