Proving that s = 1 maximises the expression

79 Views Asked by At

Consider two discrete random variables, $X$ and $Y$, with a joint distribution $P_{X,Y}$. Prove that $s=1$ maximizes the following inequality: \begin{align} &\sum_{x, y} P_X(x) P_{Y \mid X}(y \mid x) \log \left(\frac{P_{Y \mid X}(y \mid x)^s}{\sum_{\bar{x}} P_X(\bar{x}) P_{Y \mid X}(y \mid \bar{x})^s}\right)\\ \leq{}& \sum_{x, y} P_X(x) P_{Y \mid X}(y \mid x) \log \left(\frac{P_{Y \mid X}(y \mid x)}{\sum_{\bar{x}} P_X(\bar{x}) P_{Y \mid X}(y \mid \bar{x})}\right)\\ ={}& I(X; Y). \end{align} Here, $\bar{x}$ has the same marginal distribution as $x$.

A simple approach is to prove the concavity of the expression in $s$ using Holder's inequality and then demonstrate that the derivative with respect to $s$ is zero when $s=1$. However, this doesn't seem to be an elegant solution, and my intuition suggests that there might be a concise proof, perhaps involving an inequality, that establishes this result in 3-4 lines.

1

There are 1 best solutions below

0
On BEST ANSWER

Noting that $\sum_{x'} P_X(x')P_{Y|X}(y|x') = P_Y(y),$ we can define $$f(s) := \mathrm{LHS} - \mathrm{RHS}= \sum P_{XY}(x,y) \log \frac{P_{Y}(y) P_{Y|X}(y|x)^{s-1}}{\sum_{x'} P_X(x')P_{Y|X}(y|x')^s},$$ where the "$\mathrm{LHS},\mathrm{RHS}$" refer to the inequality in the question. Clearly, $f(1) = 0$, and we want to show that $f(s) \le 0$. Now, hitting this with Jensen's inequality for the concave function $\log,$ we have $$ f(s) \le \log \sum_{x,y} \frac{P_X(x)P_Y(y) P_{Y|X}(y|x)^s}{\sum_{x'} P_X(x') P_{Y|X}(y|x')^s},$$ where I've used Bayes law to write $P_{XY} = P_X \cdot P_{Y|X},$ and folded the latter into the power. But note that for a fixed $y$, $$ \sum_x \frac{P_X(x)P_{Y|X}(y|x)^s}{\sum_{x'} P_X(x') P_{Y|X}(y|x')^s} = 1,$$ so the total sum inside the logarithm is $\sum_y P_Y(y) = 1,$ and we're done. In fact, since $\log$ is strictly concave, the same shows that unless $s$ is such that $P_Y(y)P_{Y|X}(y|x)^{s-1} = \sum_{x'} P_Y(x')P_{Y|X}(y|x')^s$ for every $x,y$ (in other words, unless $s = 1$, or $Y$ is independent of $X$), $f(s)$ is strictly smaller than $0$.