Using Markov Chains to simplify mutual information expressions

565 Views Asked by At

I read a paper in Information Theory which claims that the following sum of three mutual information expressions

$$ I(Y_1;X_1,X_2,X_3)+I(Y_2;X_2,X_3\mid Y_1,X_1)+I(Y_3;X_3\mid Y_1,Y_2,X_1,X_2)\tag1$$

coincides with the following single mutual expression

$$I(Y_1;X_1,X_2,X_3)$$

if the following Markov Chains are satisfied

$$X_2 X_3\rightarrow Y_1 \rightarrow Y_2 \text{ given } X_1 \tag 2$$

$$X_3\rightarrow Y_2 \rightarrow Y_3 \text{ given } X_1 X_2\tag 3$$

Does anyone know why this is true?

Some thoughts and my reasoning

Is it true that with (2) and (3) the following are true $$I(Y_1; X_2,X_3 |X_1 ) \geq I (Y_2; X_2,X_3 |X_1 )$$ $$I(Y_2; X_3 |X_1 X_2 ) \geq I (Y_3; X_3 |X_1X_2 )$$

In this case I can upperbound (1) as

$$(1) \leq \,\,\, I(Y_1;X_1,X_2,X_3)+\underbrace {I(Y_1;X_2,X_3\mid Y_1,X_1)}_{0 ?}+\underbrace{I(Y_2;X_3\mid Y_1,Y_2,X_1,X_2)}_{0 ?}\tag1$$

Is my reasoning correct?

Thanks

2

There are 2 best solutions below

0
On

The relationship (2) implies $I(Y_2; X_2, X_3| Y_1, X_1) = 0$. Similarly, the relationship (3) implies $I(Y_3; X_3|Y_1, Y_2, X_1, X_2)=0$. This proves the desired claim.


So, I think your remaining question is "why does relationship (2) imply $I(Y_2; X_2, X_3| Y_1, X_1) = 0$"? (If you can understand that, you can also understand why (3) implies the last term is zero). Here is an explanation: Your relationship (2) is: $$ \mbox{Given $X_1$ we have:} \: \: (X_2, X_3) \rightarrow Y_1 \rightarrow Y_2 $$

This means that if $X_1$ is known, then $Y_1$ depends probabilistically on $(X_2, X_3)$. Further, given $X_1$ is known, then $Y_2$ depends probabilistically on $Y_1$ (so that $Y_2$ is conditionally independent of $(X_2, X_3)$ given that $(X_1, Y_1)$ is known). It follows that: $$ H(Y_2 | Y_1, X_1, X_2, X_3) = H(Y_2|Y_1, X_1) $$ Thus: $$ I(Y_2; X_2, X_3|Y_1,X_1) = H(Y_2|Y_1,X_1) - H(Y_2|Y_1,X_1,X_2,X_3) = 0 $$


Here is an example of random variables $X_1, X_2, X_3, Y_1, Y_2$ that satisfy (2): Define $A, B, C$ as mutually independent random variables uniformly distributed over $[0,1]$. Define:

\begin{align} X_1 &= 0\\ X_2 &= A \\ X_3 &= B \\ Y_1 &= X_2 + X_3 \\ Y_2 &= Y_1 + C \end{align}

I made this simple so $X_1$ is always zero and provides no information. Then we always have: $$ (X_2, X_3) \rightarrow Y_1 \rightarrow Y_2 $$

In particular, once we know $Y_1$, telling us the values of $(X_2, X_3)$ provides no additional information regarding the value of $Y_2$. The value $Y_2$ is conditionally independent of $(X_2, X_3)$ given $Y_1$. Thus: $$ H(Y_2 | Y_1, X_2, X_3) = H(Y_2 | Y_1) = H(C) $$

0
On

The Markov chain $X→Y→Z$ implies that $I(X;Z|Y)=0.$ From this, we can see that if we had $X_2X_3→Y_1→Y_2 $, then $I(X_2X_3;Y_2|Y_1)=0.$ But, we know $X_2X_3→Y_1→Y_2 $ is valid given $X_1$ (see (2)). In other words, we have $X_2X_3|X_1→Y_1|X_1→Y_2|X_1 $. Hence, $I(X_2X_3;Y_2|Y_1X1)=0.$

With a similar reasoning you can prove (3).