Convexity of KL-Divergence $D_{\text{KL}}(p \| q_{\theta})$ in $\theta$

565 Views Asked by At

The Kullback-Leibler divergence (DKL)

$$D_{\text{KL}}(p \| q_{\theta}) = \int_{-\infty}^{\infty} p \log \frac{p}{q_{\theta}}$$

is definetly convex in the parameters $\theta$ of the PDF $q_{\theta}$, if the PDFs $p$ and $q_{\theta}$ are in the exponential family. Because then it can be shown, that the DKL is the Bregman divergence $D_{\text{B}}(q_{\theta} \| p)$ which is convex in $\theta$ (ref, ref and sec. 13.12.7 in ref).

But how can I show that DKL is convex in the parameters $\theta$ of $q_\theta$, if $p$ is not in the exponential family?

I thought about the following reasoning:

  1. The maximum likelihood estimaton (MLE) $\theta^*=\text{argmax}_{\theta} \sum_n q(x_n | \theta)$ with $x_n \sim p$ is concave in $\theta$, if the proxy $q_\theta$ is in the exponential family (13.2.3 ref). This is because in MLE we always assume that $x_n$ is drawn from $q_{\theta^*}$ instead of $p$.
  2. In the limits ($n\rightarrow \infty$), the MLE becomes the DKL with $\lim_{n \rightarrow \inf} \text{argmax}_{\theta} \sum_n q(x_n | \theta) = \text{argmin}_{\theta} D_\text{KL}(p \| q_{\theta}) $ (ref)
  3. Given 2., the DKL is convex in the parameter space $\theta$ for any $p$ (even if it is not in the exponential family), because we haven't constrained MLE regarding the distribution $p$ from which the samples $x_n$ are drawn.

Is this reasoning correct and/or does there exist a better formal definition why $D_\text{KL}(p \| q_{\theta})$ is convex in $\theta$?

Note: Some remarks which I found useful which give an example in eq. 2.3 and 2.4, if $q$ is Gaussian.