Approximating KL divergence between mixture of gaussian and standard normal

490 Views Asked by At

I am reading this paper, and I don't get the proposition given in the appendix of this paper.

Proposition 1

Let $K,L \in \mathbb{N}$, a propbability vector $(p_1,...,p_L)$, and $\Sigma_i\in\mathbb{R}^{K\times K}$ positive definite for $i=1,...,L$ with the elements of each $\Sigma_i$ not dependent on $K$.

Let $q(x):= \sum_{i=1}^L p_i \mathscr{N}(x| \mu_i, \Sigma_i)$ be a mixture of Gaussians with $L$ components and $\mu_i\in\mathbb{R}^K$ normally distributed, and let $p(x) \sim \mathscr{N}(0,I_K)$.

The KL divergence between $q(x)$ and $p(x)$ can be approximated as:

$KL(q(x)||p(x)) \approx \sum_{i=1}^L \frac{p_i}{2}(\mu_i^T \mu_i + tr(\Sigma_i) - K(1+\log 2\pi) - \log|\Sigma_i|)$.

What does it mean that $\mu_i$ is normally distributed here? The conclusion of this proposition is approximating the KL divergence for given $\mu_i$'s as you can see here. There is no randomness!..

Moreover, the author asserts the following:

Note that we can write $H(q(x))$ as $-\sum_{i=1}^L p_i \int \mathscr{N}(\epsilon_i|0,I_K) \log q(\mu_i + L_i\epsilon_i) d\epsilon_i$ with $L_iL_i^T = \Sigma_i$.

Also, $q(\mu_i + L_i\epsilon_i) = \sum_{j=1}^L p_j (2\pi)^{-K/2}|\Sigma_j|^{-1/2} exp(-\frac{1}{2} ||\mu_j - \mu_i - L_i\epsilon_i ||^2_{\Sigma_j}$

Since $\mu_i$'s are assumed to be normally distributed, the quantity $\mu_j - \mu_i - L_i\epsilon_i$ is also normally distributed. Using the expectation of the generalised $\chi^2$ distribution with $K$ degrees of freedom, we have that for $K>>0$ there exists that $||\mu_j - \mu_i - L_i\epsilon_i||^2_{\Sigma_j} >>0$ for $i\neq j$

Thus, we can approximate $q(\mu_i + L_i\epsilon_i) \approx p_i (2\pi)^{-K/2} |\Sigma_i|^{-1/2} exp(-\frac{1}{2} \epsilon_i^T \epsilon_i)$

I completely do not get what the bolded line is talking about and how that makes sense..

Could somebody please explain why it makes sense?