In short, my question is whether the "conditioning reduces entropy" maxim is also true when conditioning on one random variable as compared to conditioning on two: $$H(X\mid Y_1, Y_2) \leq H(X\mid Y_1)?$$ I have not been able to derive this simply by expanding both sides; I know when showing $H(X\mid Y) \leq H(X)$, the argument is that the mutual information of the two random variables is non-negative. But here, I'm not sure how to work with the mutual information of conditioned random variables, i.e. $I(X\mid Y_1; Y_2)$ or whether it is true that $H(X\mid Y_1, Y_2) = H((X\mid Y_1)\mid Y_2)$.
$H(X\mid Y_1, Y_2) \leq H(X\mid Y_1)?$ (Conditional Entropy with conditioning on multiple RVs)
407 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Maybe a comment. What you wish to prove can be thought of as a corollary of (in fact, equivalent to) the inequality $H(X|Y) \le H(X)$ and that is because conditioning does produce new random variables in the sense that $(X|Y=y)$ is a legitimate random variable in a precise obvious sense (although of course some care must be taken, since $(X|Y=y)$ is associated with a different ambient probability space than that of $X$ and $Y$.) Those wondering why $(X|Y=y)$ is a random variable should feel free to open a separate question (But I think the relevant section in https://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/ answers it). I think we are assuming discrete random variables.
To be more precise, Inequality A: $$H(X| Y_1, Y_2) \le H(X|Y_1)$$ follows from Inequality B: $$H(X| Y_1 = y_1 , Y_2) \le H(X|Y_1 = y_1).$$
The preceding paragraph might raise two questions. First, what do I mean by the expression $H(X| Y_1 = y_1 , Y_2)$ which seems to demand conditioning on an event and a random variable at the same time, namely, the event $Y_1 = y_1$ and the random variable $Y_2$? The expression simply means $$H((X| Y_1 =y_1)|(Y_2|Y_1 = y_1)).$$
How does Inequality B imply Inequality A? Multiply both sides with $\Pr(Y_1 = y_1)$ and sum over all $y_1$.
Remark: Actually, even when $Y$ is not discrete, we can still assign a meaning to $(X|Y=y)$ to make this argument work for non-discrete cases as well but that would require some extra care and familiarity with the theory of disintegration of measures.
"whether it is true that $H(X∣Y1,Y2)=H((X∣Y1)∣Y2)$" Basically yes, but the second notation is not very correct, it means nothing. You can't write $(X|Y)$ as it were a random variable (don't confuse it with $(X|Y=y)$, alternative notation used in probability, which is indeed a new random variable - but that's not what we mean in the conditioned entropy notation - That is, don't confuse $H(X|Y)$ with $H(X|Y=y)$, they are radically different things)
$I(X∣Y_1;Y_2) $ is neither correct notation - it means nothing. The conditioning -take it as a rule- applies to everything else. Hence you should write $I(X;Y2 |Y_1) $, which you should read as $I( (X;Y2) |Y_1) $ : mutual information between $X$ and $Y_2$, all conditioned to the knowledge of $Y_1$.
If you have followed the chain of reasoning (proofs) : Jensen inequality $\implies $ Log sum inequality $\implies$ $D(p ||q) \ge 0$ $\implies$ $ I(X ;Y_1) \ge0$ $\implies $ $H(X|Y_1) \le H(X)$ and you want to adapt it to the conditioning of $Y_2$:, you just want to show that $I(X ;Y_1 |Y_2) $ can be written as a (conditional) relative entropy. See eg. the defition of conditional mutual information in Cover-Thomas p 23-24.