Variational Methods, why KL divergence is the difference between true distribution and approximating distribution.

174 Views Asked by At

Likelihood = $L(\textbf{w}) = P(V\mid \textbf{w})$.

$$\ln P(V\mid \textbf{w}) = \ln \sum_H P(H,V\mid \textbf{w})$$ $$= \ln \sum_H Q(H\mid V)\frac{P(H,V\mid \textbf{w})}{Q(H\mid V)}$$ $$\geq \ell(Q,\textbf{w}) = \sum_H Q(H,V)\ln\frac{P(H,V\mid \textbf{w})}{Q(H\mid V)},$$by Jensen's inequality.

So far so good. What I don't see is that the difference between the true log likelihood $\ln P(V\mid \textbf{w})$ and $\ell(Q,\textbf{w})$ is the KL divergence: $$KL(Q\|P) = -\sum_H Q(H\mid V)\frac{\ln P(H\mid V,\textbf{w})}{Q(H\mid V)} .$$

In other words, why does: $\ln P(V\mid \textbf{w}) - \ell(Q,\textbf{w}) = KL(Q\|P).$

Reference: this PDF file. (page 2, equations 3-5)

1

There are 1 best solutions below

0
On BEST ANSWER

We know $$P(H\mid V,\textbf{w})=\frac{P(H, V \mid\textbf{w})}{P(V \mid \textbf{w})}$$

Then

$$-{\rm KL}(Q\|P) = \sum_H Q(H\mid V)\ln\frac{ P(H\mid V,\textbf{w})}{Q(H\mid V)} =\\ =\sum_H Q(H\mid V)\left( \ln \frac{ P(H ,V\mid\textbf{w})}{Q(H\mid V)} - \ln P(V \mid \textbf{w}) \right)=\\ = \ell(Q,\textbf{w}) - \ln P(V \mid \textbf{w}) \sum_H Q(H\mid V) $$

Then we need $ \sum_H Q(H\mid V)=1$ to hold; which is true if $Q(H\mid V)$ is assumed to be the probability distribution for some variable $H$.