Convergence of mutual information

Question

Convergence of mutual information

1k Views Asked by Bumbble Comm At 29 Mar 2026 - 4:53

Let $P_n (x,y)$ be a sequence of (cumulative) probability distributions defined on $\mathcal{X}\times \mathcal{Y}$ (of arbitrary cardinality), that weakly converges to $P(x,y)$:

$$ P_n (x,y) \Rightarrow P(x,y) $$

i.e., $P_n (x,y)$ converges pointwise to $P(x,y)$, except on the points of discontinuity of $P(x,y)$.

Let $I_n$ be the mutual information induced by $P_n (x,y)$, defined by

$$ I_n = \int \log\frac{dP_n(x,y)}{d\left( P_n(x) \times P_n(y)\right)} dP_n $$

where $\frac{dP_n(x,y)}{d\left( P_n(x) \times P_n(y)\right)}$ is the Radon-Nikodym derivative of the joint distribution with respect to the product of the marginals.

Similarly, define

$$ I = \int \log\frac{dP(x,y)}{d\left( P(x) \times P(y)\right)} dP $$

Is it in general true that $I_n \to I$ as $n \to \infty$? If not, what are the necessary and/or sufficient conditions under which this holds true?

I feel like measure theoretic convergence theorems (bounded convergence, monotone convergence, dominated convergence) might be useful in approaching this problem, but I could not find a way of directly applying them.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This does not necessarily hold. As an example, let $Z$ be a random variable, with $H(Z) > 0$. Consider the sequence $(X_n,Y_n) = \left(\frac{1}{n}Z,\frac{1}{n}Z\right)$. Then, $(X_n,Y_n) \xrightarrow[\text{}]{\text{d}} (0,0)$ as $n\to\infty$, while

$$ I(X_n;Y_n)=I(Z;Z) = H(Z) > 0 = I(0;0). $$

Remark

In general, mutual information is weak lower-semicontinuous, i.e. if $(X_n,Y_n)$ converges weakly (in distribution) to $(X,Y)$, then $$ I(X;Y) \leq \liminf_{n\to\infty}I(X_n;Y_n). $$

Edit

To prove lower semicontinuity of the divergence, we will do the following. First, we need the following lemma:

Lemma (Donsker-Varadhan) For two probability measures $P,Q$ defined on $\mathcal{X}$, define the set of all functions $f : \mathcal{X} \to \mathbb{R}$ with the property $$ \mathbb{E}_Q\left[e^{f(X)}\right]<\infty $$ as $\mathcal{S}$. If $D(P||Q) < \infty$, then for every $f\in\mathcal{C}$, $\mathbb{E}_P[f(X)]$ exists and $$ D(P||Q) = \sup_{f\in\mathcal{C}}\left(\mathbb{E}_p[f(X)] - \log \mathbb{E}_Q\left[e^{f(X)}\right]\right). $$

Proof of Lemma

First, for the function $f = \log\frac{dP}{dQ}$, where $\frac{dP}{dQ}$ is the Radon-Nikodym derivative of $P$ with respect to $Q$, as from the existence of $D(P||Q)$ we know that $P$ is absolutely continuous with respect to $Q$. With this choice of $f$, we get $$ \mathbb{E}_P[f(X)] - \log\mathbb{E}_Q\left[e^{f(X)}\right] = \mathbb{E}_P\left[\log\frac{dP}{dQ}\right] - \log\underbrace{\mathbb{E}_Q\left[\frac{dP}{dQ}\right]}_{=1} = D(P||Q). $$

For the other direction, fix $f \in\mathcal{C}$, and consider the tilted measure $$ \widetilde{Q} = \frac{e^{f(x)}Q(dx)}{\int_{\mathcal{X}}e^{f(x)}Q(dx)}. $$

Note that, $$ \frac{d\widetilde{Q}}{dQ}=\frac{e^{f(x)}}{\int_{\mathcal{X}}e^{f(x)}Q(dx)} \implies \log \frac{d\widetilde{Q}}{dQ} = f(x) - \log\int_{\mathcal{X}}f(x)Q(dx) = f(x) - \log\mathbb{E}_Q[f(X)]. $$

Now, compute the quantity inside supremum via: $$ \mathbb{E}_P[f(X)]-\log\mathbb{E}_Q\left[e^{f(X)}\right]=\mathbb{E}_P\left[\log\frac{d\widetilde{Q}}{dQ}\right]=\mathbb{E}_P\left[\log\frac{d\widetilde{Q}dP}{dQdP}\right] = D(P||Q) - D(P||\widetilde{Q}) \leq D(P||Q). $$

Hence, we are done.

With this, we will now prove the weak lower semi-continuity of divergence. We know that set of all bounded, continuous functions are dense in $L^1$ space. Hence, we can restrict our attention to $\mathcal{C}_b$, namely, the set of all bounded, continuous functions when taking supremum. Now,

\begin{align*} \liminf_{n\to\infty}D(P_n||Q_n) & = \sup_{n\geq 1}\inf_{k\geq n}\sup_{f\in \mathcal{C}_b}\mathbb{E}_{P_n}[f(X)]-\log\mathbb{E}_{Q_n}\left[e^{f(X)}\right] \\ & \geq \sup_{n\geq 1}\sup_{f\in \mathcal{C}_b}\inf_{k\geq n}\mathbb{E}_{P_n}[f(X)]-\log\mathbb{E}_{Q_n}\left[e^{f(X)}\right] \\ & = \sup_{f\in \mathcal{C}_b}\sup_{n\geq 1}\inf_{k\geq n}\mathbb{E}_{P_n}[f(X)]-\log\mathbb{E}_{Q_n}\left[e^{f(X)}\right] \\ & = \sup_{f\in\mathcal{C}_b}\liminf_{n\to\infty}\mathbb{E}_{P_n}[f(X)]-\log\mathbb{E}_{Q_n}\left[e^{f(X)}\right] \\ & = \sup_{f\in\mathcal{C}_b} \mathbb{E}_{P}[f(X)]-\log\mathbb{E}_{Q}\left[e^{f(X)}\right] \\ & = D(P||Q), \end{align*} where the first line is definition of liminf, second line follows from minimax inequality, third line follows as I can swap orders of two supremums, fifth line is a consequence of the equivalent characterization of weak convergence (apparently aka Portmanteau theorem), and the last line is just characterization of KL divergence via Donsker-Varadhan lemma above.

Convergence of mutual information

There are 1 best solutions below

Related Questions in MEASURE-THEORY

Related Questions in PROBABILITY-THEORY

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions