When is Conditional Mutual Information greater than Mutual Information and what does it represent?

Question

When is Conditional Mutual Information greater than Mutual Information and what does it represent?

1.4k Views Asked by Bumbble Comm At 26 Mar 2026 - 12:34

I am struggling to find the cases for which $I(X;Y|Z)>I(X;Y)$. The only mathematical example I could find for such a case is the following: $$ I(X;Y) + I(X;Z|Y) = I(X;Z) + I(X;Y|Z). $$ This makes sense since they are both definitions of $I(X;Y,Z)$. So, if we assume $X$ and $Z$ to be independent such that $I(X;Z) = 0$, then $$ I(X;Y|Z) - I(X;Y) = I(X;Z|Y) \geq0 $$ such that $$ I(X;Y|Z) \geq I(X;Y). $$ The issue I have with this example is that if we considered $X$ and $Z$ to be independent, I also would expect $I(X;Z|Y)$ to be equal to $0$ and not greater than $0$. If it was $0$ then the MI and CMI would be equal which I can understand, but I do not get how this can be achieved and how to interpret it properly. In other words, how can conditioning a third random variable increase the mutual information between two other random variables mathematically and how can this be interpreted?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 30 Aug 2019 - 11:29

The classical example is : let $X,Y$ be independent fair Bernoulli variables (take values $\{0,1\}$ with equal probability), and let $Z=X+Y \pmod 2$ (in boolean logic: $Z= X \oplus Y$ where $\oplus$ is an XOR operator).

It's easy to see that all $X,Y,Z$ have 1 bit of entropy, and that they are all pairwise independent ($P(Z|X)=P(Z)$) hence, $I(X;Y)=I(X;Z)=0$

However, it's also obvious that knowing two variables let you know the other, so for example $H(X | Y,Z)=0$ and

$$I(X;Y|Z) = H(X|Z) - H(X | Y,Z)=1 - 0 = 1 $$

The way to understand $I(X;Y|Z) > I(X;Y)$ , in this example, is: $Y$, by itself, does not give us any information gain about $X$. However, if we are given $Z$ (condition!), things change: $Y$ now gives us a lot of information,

Bumbble Comm On 06 Sep 2019 - 1:58

In terms of interpretation, $I(X;Y|Z)>I(X;Y)$ is an indication that at least to some degree $X$ and $Y$ convey synergistic information about $Z$ (even though a certain degree of redundancy could still be present). $I(X;Y)-I(X;Y|Z)$ can be decomposed as the difference $R-S$, where $R$ denotes the redundant component and $S$ the synergistic one. Only when $R-S<0$ can you conclude that there has to be some amount of synergy in the way $X$ and $Y$ encode $Z$.

**Bumbble Comm** · Accepted Answer

Actually, the case you consider, that is, with $X$ and $Z$ being independent, is a well-known case where conditioning increases the mutual information.

To provide some intuition/interpretation of this result, consider a communication channel, where $X$ represents the "message" sent by a transmitter, $Z$ is the additive "noise" introduced by the channel, and $Y$ is what the receiver observes. In addition to $X$ and $Z$ being independent, the observation is modeled as $$ Y = X + Z. $$ The result $I(X;Y|Z)\geq I(X;Y)$ essentially states that knowledge at the receiver of the noise realization $Z$ (in addition to $Y$) can only increase the information about $X$. This is, of course, intuitive. (Actually, knowledge of $Y$ and $Z$ determines $X$ exactly, therefore the inequality of the mutual in formations is, here, strict.)

One issue you have with the proof of this result is how can it be that $I(X;Z|Y)> 0$ (strict inequality) when $I(X;Z)=0$. This question can be more generally posed as how come $p(x,z|y)\neq p(x|y) p(z|y)$ (i.e., $X$ and $Z$ are not independent conditioned on $Y$), even though $p(x,z)=p(x) p(z)$ ($X$ and $Z$ are independent when no conditioning is imposed).

Note that is indeed the case in the communication channel: given $Y$, knowledge of $Z$ provides information about $X$, therefore, $X$ and $Z$ are not independent when conditioned on $Y$. In summary, one can state the following

Two independent variables $X$ and $Z$ can become dependent when conditioned on a appropriate third variable $Y$ (which, obviously, should depend on both $X$ and $Y$)

When is Conditional Mutual Information greater than Mutual Information and what does it represent?

There are 3 best solutions below

Related Questions in INFORMATION-THEORY

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions