Transitive Mutual Information Bernoulli Random Variables in a fork structure

144 Views Asked by At

Suppose we have Bernoulli random variables $X$, $Y$, and $Z$, whose dependency structure is $Y \leftarrow X \rightarrow Z$ (i.e. $Y \perp Z \mid X)$. If we know $\mathrm{I}(X:Y)$ and $\mathrm{I}(X:Z)$, can we get an exact value or a lower bound (or a tighter upper bound) on $\mathrm{I}(Y:Z)$?

From the data processing inequality, we know that

$\mathrm{I}(Y:Z) \leq min(\mathrm{I}(X:Z), \mathrm{I}(X:Y))$

Can we exploit the fact that all the variables can only carry up to 1 bit of information to make a stronger statement about $\mathrm{I}(Y:Z)$?

1

There are 1 best solutions below

0
On

Notice the elementary equality $$I(Y;Z) = I(Y;X,Z) - I(Y;X|Z) = I(Y;X) - I(Y;X|Z),$$ where we've used the chain rule first, and then used the conditional indepdence to argue $I(Y;Z|X) = 0.$ In general, $I(Y;Z)$ can, of course, be as small as $0$, since the conditional independence allows for actual independence. This is irrespective of how large $I(X;Y)$ and $I(X;Z)$ are because we can take $X = (U,V)$ for independent $U,V,$ and $Y = U, Z = V$.


We can use the about to write down a symmetric expression, which also sheds clearer light on when $I(Y;Z)$ is positive.

Running the same expansion but with $Y$ and $Z$ interchanged, and then averaging tells us that

$$ I(Y;Z) = \frac{ I(X;Y) + I(X;Z) - I(X;Y|Z) - I(X;Z|Y)}2 \\ = H(X) + H(X|Y,Z) - H(X|Z) - H(X|Y),\\ = I(X;Y) + I(X;Z) + H(X|Y,Z) - H(X)\\ = I(X;Y) + I(X;Z) - I(X;Y,Z).$$ where in the first line we have used that $I(X;Y|Z) = H(X|Z) - H(X|YZ)$ and $I(X;Z) = H(X) - H(X|Z),$ and similarly for the other terms.

This tells us that $I(Y;Z)$ is positive iff $I(Y;X) + I(Z;X)$ exceeds $I(X;Y,Z)$. This should make sense - loosely, the only way $Y$ can tell us about $Z$ is if there is some 'common' information about $X$ between the two. This means that the sum of the marginal informations must exceed the information that they have jointly.

From the above, we can also work out the simpler lower bound $$I(Y;Z) \ge \max(0, I(X;Y) + I(X;Z) - H(X))$$ by bounding $I(X;Y,Z) \le H(X)$. The advantage of this is that it is only a function of the marginal laws $P_{XY}, P_{XZ}$ (like the upper bound in the question). Notice that this is tight, since we can take $X = (U,V,W)$ for independent $U,V,W$ and $Y = (U,W), Z = (V,W)$ to together attain any possible triple of $(H(X), I(X;Y), I(X;Z)),$ and in this setting $$I(Y;Z) = H(W) = (H(W) + H(U)) + (H(W) + H(V)) - (H(U) + H(V) + H(W)) \\ = I(X;Y) + I(X;Z) - H(X).$$

Note - the answer is incomplete, because the tightness argument it presents needs at least $X$ to be non-binary. However, note that the argument does work as long as the alphabet of $X$ has at least $8$ letters, and of $Y,Z$ at least $4$, so it isn't very far off. It's certainly possible that more can be said in the binary setting.