Suppose we have an additive noise channel: $Y = X + Z$, where $Z$ is noise, independent of $X$. So we can write the mutual information as: $I(X;Y) = h(Y) - h(Y|X) = h(Y) - h(Z)$.
We can also write this as: $I(X;Y) = h(X) - h(X|Y)$. But if $Y = X + Z$ than $X = Y - Z$, so it seems like $I(X;Y) = h(X) - h(Z)$. This is obviously not correct, since $h(Y)$ doesn't have to be equal to $h(X)$, but what is the problem? Why isn't $h(X|Y) = h(Z)$?
Thanks.
The random variables $X$ and $Z$ are independent and therefore, when you condition the RV $Y$ on $X=x$, you simply get a shifted version of $Z$, which has the same entropy as $Z$ (imagine if you added a constant to a Gaussian - you would have shifted the mean of the Gaussian, but nothing else changes).
On the other hand, $Y$ and $Z$ are not independent. Therefore, even if you condition $X=Y-Z$ on $Y=y$, you don't have a simple way of representing it in terms of $Z$ independent of the value of $Y$. In other words, you can go as far as saying that $h(X|Y) = h(Y-Z|Y) = h(Z|Y)$. However, you cannot simplify to say that $h(Z|Y) = h(Z)$.