While Solving problems in "Elements of Information Theory" by Cover and Thomas, I found this problem in the last chapter.
Let $X \to Y \to (Z,W)$ be a Markov chain i.e. $p(x,y,z,w)= p(x)p(y|x)p(z,w|y)$. Then show that $$I(X;Z) + I(X;W) \leq I(X;Y) + I(W;Z)$$
Note the symmetry in $W$ and $Z$.
My attempt:
It follows that $X\to Y \to Z$ and $X \to Y \to W $. Hence by usual data processing inequality, we have $I(X;Z) \leq I(X;Y)$. In fact $I(X;Z) = I(X;Y) - I(X;Y|Z)$.
Now in one expansion, $$I(X;W) \leq I(X,Z;W,Y) = I(Z;W) + I(Z;Y|W) + I(X;Y|Z) +I(X;W|Z,Y)$$ The third term in RHS will vanish when we add the two expressions. But I do not have a rationale to eliminate the other two.
The other expansion is: $$I(X;W) \leq I(X,Z;W) = I(Z;W) + I(X;W|Z)$$
Normally in these kinds of problems, one expands in a clever way and eliminates terms. Kindly help me find the expansion that will work.
Keywords: Information theory, Mutual Information, Data Processing inequality, chain rule.
Looks like I figured it out...
$$I(X;W) \leq I(XZ;W)=I(Z;W)+I(X;W|Z)$$ $$= I(Z;W)+I(X;ZW)-I(X;Z) $$ $$ \leq I(Z;W)+I(X;Y)-I(X;Z)$$
where the last inequality follows from Data processing inequality since $X \to Y \to (Z,W)$
I feel there must be a method to this "madness".
Update: Turns out there is a method to this madness. Check out Raymond Yeung's book on Information Theory and Network Coding to convert the above problem to that of set theoretic and measure theoretic manipulation!! $\blacksquare$