Two questions about minimizing KL-divergence

336 Views Asked by Bumbble Comm At 03 Apr 2026 - 3:06

I have two questions on the following lemma.

$1$. How did we get the last inequality? It seems to me that the author is saying $\int p_{\theta_0}\,d\mu = 1$. But I don't see why that is true.

$2$. Is this lemma over-complicating thing? I think this lemma is saying that KL-divergence is uniquely minimized at true parameter $\theta_0$, when the true model is identifiable. I think we can prove this more easily using Jensen's inequality, like in section 5.2 in this notes.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 26 Feb 2018 - 12:54 BEST ANSWER

The $p_{\theta}$ are "subprobability densities", implying that $\int p_{\theta}\mathrm{d}\mu\leq 1$ for all $\theta:\Theta$. Then $$\begin{split} 2\int\sqrt{p_{\theta}p_{\theta_0}}\mathrm{d}\mu -2 &\leq 2\int\sqrt{p_{\theta}p_{\theta_0}}\mathrm{d}\mu - \int p_{\theta}\mathrm{d}\mu -\int p_{\theta}\mathrm{d}\mu \\ &=-\int\left(p_{\theta}-2\sqrt{p_{\theta}p_{\theta_0}}+p_{\theta_0}\right)\mathrm{d}\mu\\ &=-\int\left(\sqrt{p_{\theta}}-\sqrt{p_{\theta_0}}\right)^2\mathrm{d}\mu \end{split}$$
I agree, the "right" way to prove this result is to exploit the theory of convex functions. Perhaps the author was trying to borrow intuition from the theory of inner product spaces?

It might be worth mentioning that the proof of this lemma gives the scholium $$H(P,Q)^2\leq D(Q\parallel P)$$ where $P$ and $Q$ are probability distributions, $H$ is the Hellinger distance, and $D$ is the K–L divergence.

Two questions about minimizing KL-divergence

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Trending Questions

Popular # Hahtags

Popular Questions