Minimize KL-divergence $D(p\|q)$ when $q$ is Gaussian

3.8k Views Asked by At

Consider $p(x)$ and $q(x)$ are two pdfs of $X$, where $q(x)$ is a Gaussian distribution with the following form

$$q(x) = \frac{1}{\sqrt{2\pi\sigma_q^2}} \exp\left(-\frac{x^2}{2\sigma_q^2}\right).$$

Additionally, the variance of $x$ under pdf $p(x)$ is given (constrained), say $\sigma_p^2$, where $\sigma_p \neq \sigma_q$. Note that $p$ can be any pdf which is Lebesgue integrable.

Then which distribution $p(x)$ can minimize the KL-divergence $D(p\|q)$? The KL-divergence is defined as follows.

$$D(p\|q) = \int_{-\infty}^\infty p(x) \log\frac{p(x)}{q(x)} \, dx.$$

Thanks!

Note: If there is no variance constraint of $p(x)$, the answer is simply $p = q$ (a.e.). But this question is based on $\sigma_p \neq \sigma_q$. How to find the minimizer $p(x)$? Thanks a lot!

2

There are 2 best solutions below

7
On BEST ANSWER

Hint. The optimal distribution is still Gaussian. In this case, the optimal Gaussian parameter is easily calculated. We show this by Lagrange multipliers: $$D^*(p\|q)=\int p(x)\log\frac{p(x)}{q(x)}dx-\lambda_1\left(\int x^2p(x) dx-\mu_p^2-\sigma_p^2\right)\\-\lambda_2\left(\int xp(x)dx-\mu_p\right)-\lambda_3\left(\int p(x)dx-1\right)$$ By calculus of variation we get $$ \log\frac{p(x)}{q(x)}+1-\lambda_1x^2-\lambda_2x-\lambda_3=0 $$ From which we may conclude $p(x)=q(x)e^{\lambda_1x^2+\lambda_2x+\lambda_3-1}$, hence a Gaussian.

11
On

$$\begin{align}D(p||q) &= E_{p}\left(\log\frac{p(x)}{q(x)}\right) \\ &= E_{p}(\log p(x)) - E_{p}\left(\log q(x)\right)\\ &= E_{p}(\log p(x)) - E_{p}\left(\log\left(\frac{\exp\left(-\frac{x^2}{2\sigma_q^2}\right)}{\sqrt{2\pi}\sigma_q}\right)\right)\\ &= E_{p}(\log p(x)) + \log(\sqrt{2\pi}\sigma_q) + E_{p}\left(\frac{x^2}{2\sigma_q^2}\right)\\ &= E_{p}(\log p(x)) + \log(\sqrt{2\pi}\sigma_q) + \frac{\sigma_p^2 + \mu_p^2}{2\sigma_q^2}\end{align}$$

$$\begin{align}D(p||q) \geq 0 &\iff E_{p}(\log p(x)) + \log(\sqrt{2\pi}\sigma_q) + \frac{\sigma_p^2 + \mu_p^2}{2\sigma_q^2} \geq 0\\ &\iff H(p) \leq \log(\sqrt{2\pi}\sigma_q) + \frac{\sigma_p^2 + \mu_p^2}{2\sigma_q^2} = \log\left(\sqrt{2\pi e^{(\sigma_p^2 + \mu_p^2)/\sigma_q^2}}\sigma_q\right)\end{align}$$

$$D(p||q) = 0 \iff H(p) = \log\left(\sqrt{2\pi e^{(\sigma_p^2 + \mu_p^2)/\sigma_q^2}}\sigma_q\right)$$

Any distribution $p(x)$ with mean $0$ and variance $\sigma_p^2$, which has an entropy of $\log\left(\sqrt{2\pi e^{(\sigma_p^2 + \mu_p^2)/\sigma_q^2}}\sigma_q\right)$, will make $D(p||q) = 0$.

If one would like to minimize the KL-divergence and maximize the entropy of the distribution $p(x)$, then the $p(x)$ should be chosen to be a gaussian distribution with mean $0$ and variance $\sigma_p^2$.

Edit: I think the answer by @CaveJohnson is more appropriate than mine. However, I will keep my answer here for review later.