Given a Markov chain $X \rightarrow Y \rightarrow Z$, why is $I(X;Y|Z) \leq I(X;Y)$?

750 Views Asked by At

A Markov chain $X \rightarrow Y \rightarrow Z$ is given, where $X,Y,Z$ are random variables characterized by the probability distribution $p(x,y,z) = p(x)p(y|x)p(z|y)$.

It follows that $I(X;Y) \geq I(X;Z)$.

Given this, why is it true that $I(X;Y|Z) \leq I(X;Y)$ ?

I cannot understand this fact even at an intuitive level: why does knowing $Z$ reduce the mutual information of $X$ and $Y$ ?

2

There are 2 best solutions below

0
On

The property follows by the chain rule of mutual information \begin{align} I(X ; Y, Z) &= I(X ;Y) + I(X; Z | Y) \stackrel{(a)}{=} I(X ; Y)\\ & =I(X;Z) + I(X;Y|Z) \geq I(X;Y|Z) \end{align} where (a) is by the Markov chain. While the data-processing inequality $I(X;Y) \geq I(X;Z)$ implies that $Z$ cannot contain more information about $X$ than $Y$ contains about $X$, the property $I(X;Y|Z)$ characterizes the loss caused by the processing $p(z|y)$, which must be smaller than or equal $I(X;Y)$, e.g. in the special case where $Z=Y$ there is no loss of information, $I(X;Y|Z) = 0$.

It can also be shown using entropies \begin{align} I(X;Y | Z) = H(X | Z) - H(X|Y, Z) \leq H(X) - H(X | Y) = I(X;Y) \end{align} where the inequality follows since $H(X|Z)\leq H(X)$ (conditioning reduces entropy) and $H(X| Y,Z) = H(X|Y)$ (Markov chain).

0
On

From definition of Markovity in $p(x,y,z) = p(x)p(y|x)p(z|y)$, one can prove that $p(x,z|y) = p(x|y)p(z|y)$. This will give the following as another definition of Markovity:

Past and future are conditionally independent given the present.

With this, we can see that, if $X \rightarrow Y \rightarrow Z$, then $I(X; Z | Y)=0$. Therefore we have:

$$I(X ; Y, Z) = I(X ;Y) + I(X; Z | Y) = I(X ; Y).$$ On the other hand, $$I(X ; Y, Z) =I(X;Z) + I(X;Y|Z).$$ Then knowing that $I(X;Z)\ge 0$, we will have $$ I(X ; Y)\geq I(X;Y|Z). $$

The equality is for the case where $I(X;Z)= 0$, i.e, when $X$ and $Z$ are independent.

When $X$ and $Z$ are NOT independent, we have $ I(X ; Y) > I(X;Y|Z). $ Intuitively this makes sense and indicates that knowng $Z$ will reduce the uncertinity of $X$ with respect to $Y$.