I'm currently reading Elements of Information Theory and I'm a little confused when it comes to joint entropy. The book provides two separate definitions for it:
$$H(X,Y) = -\sum_{x ∈ X} \sum_{y∈Y} \,p(x,y)\,log\,p(x,y)$$
where p(x,y) is the probability mass function for x and y.
The other version is:
$$H(X,Y) = H(X) + H(Y|X)$$
Further, the conditional entropy is defined as:
$$H(Y|X) = \sum_{x ∈ X} p(x)H(Y|X=x)$$
This seemed fine at first, but when I try to apply these methods to the same problem I get different results. Example:
Suppose a fair die is thrown. Let X denote the number facing upwards after a throw. Further let Y denote if X is even or odd. Calculate H(XY).
Now unless I'm completely off, $p(x,y)$ would in this case would be $\frac 1 6$ because the probability of getting odd or even depends on the number that is facing upwards. Therefore when I try the fist method I get:
$$H(XY) =-(\frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6 + \frac 1 6 log\frac 1 6) = log6$$
Now suppose I want to try to verify this using the second formula. Incidentally, $H(X)$ is exactly equal to $H(XY)$ and $H(Y|X)$ is:
$$H(Y|X) = \\-(\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2) +\\ \frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)+\\\frac 1 6 * ( -\frac 1 2log\,\frac 1 2 -\frac 1 2log\frac 1 2)) = 1$$
Which means that $H(XY) = log6 + 1$. Is this even possible? Am I doing something wrong?
Any help or insight is very much appreciated.
For any fixed $x$, the value of $H(Y \mid X = x)$ should equal 0; this is what formalizes your intuition that there is no further information in $Y$ once $X$ is known, since $Y$ is fully determined by $X$. In other words, $H(X, Y) = H(X)$ (the information in both $X, Y$ is the same as the information in $X$).
The problem in your calculation is that you compute $H(Y \mid X = x)$ as if you are computing $H(Y)$; in fact, $H(Y \mid X = x) = -1\log(1) - 0\log(0) = 0$ (with the standard convention that $0\log(0)$ is defined and equals 0).