I am struggling to find the cases for which $I(X;Y|Z)>I(X;Y)$. The only mathematical example I could find for such a case is the following: $$ I(X;Y) + I(X;Z|Y) = I(X;Z) + I(X;Y|Z). $$ This makes sense since they are both definitions of $I(X;Y,Z)$. So, if we assume $X$ and $Z$ to be independent such that $I(X;Z) = 0$, then $$ I(X;Y|Z) - I(X;Y) = I(X;Z|Y) \geq0 $$ such that $$ I(X;Y|Z) \geq I(X;Y). $$ The issue I have with this example is that if we considered $X$ and $Z$ to be independent, I also would expect $I(X;Z|Y)$ to be equal to $0$ and not greater than $0$. If it was $0$ then the MI and CMI would be equal which I can understand, but I do not get how this can be achieved and how to interpret it properly. In other words, how can conditioning a third random variable increase the mutual information between two other random variables mathematically and how can this be interpreted?
When is Conditional Mutual Information greater than Mutual Information and what does it represent?
1.4k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
The classical example is : let $X,Y$ be independent fair Bernoulli variables (take values $\{0,1\}$ with equal probability), and let $Z=X+Y \pmod 2$ (in boolean logic: $Z= X \oplus Y$ where $\oplus$ is an XOR operator).
It's easy to see that all $X,Y,Z$ have 1 bit of entropy, and that they are all pairwise independent ($P(Z|X)=P(Z)$) hence, $I(X;Y)=I(X;Z)=0$
However, it's also obvious that knowing two variables let you know the other, so for example $H(X | Y,Z)=0$ and
$$I(X;Y|Z) = H(X|Z) - H(X | Y,Z)=1 - 0 = 1 $$
The way to understand $I(X;Y|Z) > I(X;Y)$ , in this example, is: $Y$, by itself, does not give us any information gain about $X$. However, if we are given $Z$ (condition!), things change: $Y$ now gives us a lot of information,
On
In terms of interpretation, $I(X;Y|Z)>I(X;Y)$ is an indication that at least to some degree $X$ and $Y$ convey synergistic information about $Z$ (even though a certain degree of redundancy could still be present). $I(X;Y)-I(X;Y|Z)$ can be decomposed as the difference $R-S$, where $R$ denotes the redundant component and $S$ the synergistic one. Only when $R-S<0$ can you conclude that there has to be some amount of synergy in the way $X$ and $Y$ encode $Z$.
Actually, the case you consider, that is, with $X$ and $Z$ being independent, is a well-known case where conditioning increases the mutual information.
To provide some intuition/interpretation of this result, consider a communication channel, where $X$ represents the "message" sent by a transmitter, $Z$ is the additive "noise" introduced by the channel, and $Y$ is what the receiver observes. In addition to $X$ and $Z$ being independent, the observation is modeled as $$ Y = X + Z. $$ The result $I(X;Y|Z)\geq I(X;Y)$ essentially states that knowledge at the receiver of the noise realization $Z$ (in addition to $Y$) can only increase the information about $X$. This is, of course, intuitive. (Actually, knowledge of $Y$ and $Z$ determines $X$ exactly, therefore the inequality of the mutual in formations is, here, strict.)
One issue you have with the proof of this result is how can it be that $I(X;Z|Y)> 0$ (strict inequality) when $I(X;Z)=0$. This question can be more generally posed as how come $p(x,z|y)\neq p(x|y) p(z|y)$ (i.e., $X$ and $Z$ are not independent conditioned on $Y$), even though $p(x,z)=p(x) p(z)$ ($X$ and $Z$ are independent when no conditioning is imposed).
Note that is indeed the case in the communication channel: given $Y$, knowledge of $Z$ provides information about $X$, therefore, $X$ and $Z$ are not independent when conditioned on $Y$. In summary, one can state the following