I've seen mentioned in (Horodecki, Oppenheim, Winter 2005) the fact that the conditional information equals the amount of information that Alice needs to send Bob in order for him to fully reconstruct information about a source, provided Bob already has some information.
More precisely, say Alice holds a random variable $X$, Bob holds $Y$, and there is some correlation between them. Because of the correlation, Bob has some amount of (generally incomplete) information about $X$. The statement is that, in order for Bob to have full knowledge about $X$, it is sufficient that Alice sends him $H(X|Y)$ bits of information.
Is there some general intuition to understand where this result comes from? How does Alice decide what information to send Bob? This result seems to have been first presented by Slepian and Wolf in 1973, but I haven't been very lucky in finding more modern references for this particular protocol.
As a simple example, consider a pair of binary random variables, with possible equiprobable outcomes being $00, 11, 01$. In other words, we have the joint probability distribution $$p(0,0) = p(0,1) = p(1,1) = \frac13.$$ In this case, one can see that $H(X|Y)=\frac23$, meaning Alice should just need to send (on average) 2/3 of a bit to give Bob full knowledge of her side. How would this work?
Remember that entropy is average information: maybe Alice needs to send $0$ bits $1/3$ of the time (if Bob can already work out the information) and $1$ bit $2/3$ of the time. Maybe she needs to send $0$ bits $13/15$ of the time and $5$ bits $2/15$ of the time etc. Or maybe neither of these is true. From entropy alone, we cannot tell: this information would come from the distribution of the random variable $X|Y$.
In the case you have given, it is the former. If Bob, who knows $Y$, sees $Y=0$ then he concludes that $(X,Y)=(0,0)$, because $(1,0)$ is not a possible value. That is, $1/3$ of the time, Bob already knows all the information. If he sees $Y=1$, however, then he needs Alice to send $1$ bit of information: the value of $X$. That tells him whether it is $(1,1)$ or $(0,1)$. Each of the three outcomes is equally likely (so $Y=0$ w.p. $1/3$ and $Y=1$ w.p. $2/3$), so Alice sends on average $2/3$ of a bit of information.