$H(X\mid Y_1, Y_2) \leq H(X\mid Y_1)?$ (Conditional Entropy with conditioning on multiple RVs)

407 Views Asked by At

In short, my question is whether the "conditioning reduces entropy" maxim is also true when conditioning on one random variable as compared to conditioning on two: $$H(X\mid Y_1, Y_2) \leq H(X\mid Y_1)?$$ I have not been able to derive this simply by expanding both sides; I know when showing $H(X\mid Y) \leq H(X)$, the argument is that the mutual information of the two random variables is non-negative. But here, I'm not sure how to work with the mutual information of conditioned random variables, i.e. $I(X\mid Y_1; Y_2)$ or whether it is true that $H(X\mid Y_1, Y_2) = H((X\mid Y_1)\mid Y_2)$.

2

There are 2 best solutions below

1
On BEST ANSWER

"whether it is true that $H(X∣Y1,Y2)=H((X∣Y1)∣Y2)$" Basically yes, but the second notation is not very correct, it means nothing. You can't write $(X|Y)$ as it were a random variable (don't confuse it with $(X|Y=y)$, alternative notation used in probability, which is indeed a new random variable - but that's not what we mean in the conditioned entropy notation - That is, don't confuse $H(X|Y)$ with $H(X|Y=y)$, they are radically different things)

$I(X∣Y_1;Y_2) $ is neither correct notation - it means nothing. The conditioning -take it as a rule- applies to everything else. Hence you should write $I(X;Y2 |Y_1) $, which you should read as $I( (X;Y2) |Y_1) $ : mutual information between $X$ and $Y_2$, all conditioned to the knowledge of $Y_1$.

If you have followed the chain of reasoning (proofs) : Jensen inequality $\implies $ Log sum inequality $\implies$ $D(p ||q) \ge 0$ $\implies$ $ I(X ;Y_1) \ge0$ $\implies $ $H(X|Y_1) \le H(X)$ and you want to adapt it to the conditioning of $Y_2$:, you just want to show that $I(X ;Y_1 |Y_2) $ can be written as a (conditional) relative entropy. See eg. the defition of conditional mutual information in Cover-Thomas p 23-24.

0
On

Maybe a comment. What you wish to prove can be thought of as a corollary of (in fact, equivalent to) the inequality $H(X|Y) \le H(X)$ and that is because conditioning does produce new random variables in the sense that $(X|Y=y)$ is a legitimate random variable in a precise obvious sense (although of course some care must be taken, since $(X|Y=y)$ is associated with a different ambient probability space than that of $X$ and $Y$.) Those wondering why $(X|Y=y)$ is a random variable should feel free to open a separate question (But I think the relevant section in https://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/ answers it). I think we are assuming discrete random variables.

To be more precise, Inequality A: $$H(X| Y_1, Y_2) \le H(X|Y_1)$$ follows from Inequality B: $$H(X| Y_1 = y_1 , Y_2) \le H(X|Y_1 = y_1).$$

The preceding paragraph might raise two questions. First, what do I mean by the expression $H(X| Y_1 = y_1 , Y_2)$ which seems to demand conditioning on an event and a random variable at the same time, namely, the event $Y_1 = y_1$ and the random variable $Y_2$? The expression simply means $$H((X| Y_1 =y_1)|(Y_2|Y_1 = y_1)).$$

How does Inequality B imply Inequality A? Multiply both sides with $\Pr(Y_1 = y_1)$ and sum over all $y_1$.

Remark: Actually, even when $Y$ is not discrete, we can still assign a meaning to $(X|Y=y)$ to make this argument work for non-discrete cases as well but that would require some extra care and familiarity with the theory of disintegration of measures.