Information-theoretic Inequality

90 Views Asked by Bumbble Comm At 03 Apr 2026 - 7:44

If we have two discrete RVs, X, and Y. How can we show:

$$\sum_{x,y} p(x|y)p(y|x) \geq 1.$$

The question goes further with finding a sufficient and necessary condition for equality.

My attempt: For equality, assuming that X, Y are independent will enable us to sum over each variable PMF and get exactly one. However, I am stuck with showing how the inequality holds in general, and I appreciate any hints and tips.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 21 Feb 2024 - 8:56 BEST ANSWER

Another similar approach, exploiting the convexity of the logarithmic function, we can write

\begin{equation} \begin{aligned} \log\left(\mathbb{E}\left[\frac{p(y|x)}{p(y)}\right]\right) &\underbrace{\geq}_{\text{Jensen's inequality}} \mathbb{E}\left[\log\left(\frac{p(x,y)}{p(x)p(y)}\right)\right]\\ &\underbrace{=}_{D(p(x,y)||p(x)p(y))}I(X;Y) \geq 0, \end{aligned} \end{equation} such that $I(X;Y) = H(X)-H(X|Y)$ is the mutual information. For equality, $I(X;Y) = 0$, i.e., $X$ and $Y$ are independent.

Bumbble Comm On 16 Feb 2024 - 6:29

Applying Bayes' rule, $$ S:= \sum_{x,y} p(x|y) p(y|x) = \sum_{x,y} \frac{p(x,y)^2}{p(x) p(y)}. $$ Now let $q(x,y) := p(x) p(y)$, which is the law with independent $X,Y$ that matches the marginal distributions of $p$. Then we have $$ S = \sum_{x,y} q(x,y) \cdot \left(\frac{p(x,y)}{q(x,y)}\right)^2 =\mathbb{E}_q[ f(X,Y)^2],$$ where $f(x,y) = p(x,y)/q(x,y)$. But by Jensen't inequality, $$ S \ge (\mathbb{E}_q[f(X,Y)])^2 = 1.$$

In fact, you're essentially looking at the $\chi^2$-divergence, which is defined as $$D_{\chi^2}(P\|Q) = \mathbb{E}_Q\left[ \left( \frac{\mathrm{d}P}{\mathrm{d}Q}(X) - 1\right)^2\right],$$ which in the discrete case exactly works out to $\sum_{x} \frac{p^2(x)}{q(x)} - 1.$ This is an $f$-divergence, and has many of the nice properties of KL divergence. In particular, $D_{\chi^2} \ge 0$, with equality if and only if $P = Q$ (which in your case translates to $p(x,y) = p(x) p(y)$, i.e., independence). The actual functional you're studying is (roughly) a $\chi^2$-version of mutual information. As another pointer to the literature, $\log (1 + D_{\chi^2})$ is called the Renyi-divergence of order 2.

Information-theoretic Inequality

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in INEQUALITY

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in CONDITIONAL-PROBABILITY

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions