A proof for increasing the Evidence Lower Bound always results in increment in the log marginal likelihood

35 Views Asked by At

I have the confusion about the $\mathrm{ELBO}$. Let's say if we have the observed data $x$, the hidden variable $z$ with underlying parameters $\theta$. The log marginal likelihood function is defined as $l(\theta|x):=\ln p(x|\theta)$. With an arbitrary density function of the hidden variable $z$, $q(z)$, it is not hard to derive the following relation

$$ \begin{aligned} & \ln p(x|\theta)\\ &= \int q(z) \ln p(x|\theta)dz\\ &= \int q(z) \ln \frac{q(z)}{q(z)}\frac{p(x,z|\theta)}{p(z|x,\theta)}dz\\ &= \int q(z)\ln\frac{p(x,z|\theta)}{q(z)}dz+\int q(z)\ln\frac{q(z)}{p(z|x,\theta)}dz\\ &:= \mathrm{ELBO}(q)+\mathrm{KL}(q\|p(z|x,\theta)) \end{aligned} $$

The explanation for increasing the $\mathrm{ELBO}$ is that since the $\mathrm{KL}$ divergence is always nonnegative, maximizing $\mathrm{ELBO}$ is equivalent to minimizing the $\mathrm{KL}$ divergence. And the increase of $\mathrm{ELBO}$ always results in an increase of the log marginal likelihood.

For Expectation and Maximization (EM) algorithm, let $q(z):=q(z|x,\theta^{(t)})$, then we have

$$ \begin{aligned} &l(\theta|x)\\ &= \int q(z|x,\theta^{(t)})\ln\frac{p(x,z|\theta)}{q(z|x,\theta^{(t)})}dz+\int q(z|x,\theta^{(t)})\ln\frac{q(z|x,\theta^{(t)})}{p(z|x,\theta)}dz\\ &= \int q(z|x,\theta^{(t)})\ln p(x,z|\theta)dz-\int q(z|x,\theta^{(t)})\ln q(z|x,\theta^{(t)})dz+\int q(z|x,\theta^{(t)})\ln\frac{q(z|x,\theta^{(t)})}{p(z|x,\theta)}dz\\ &:=Q(\theta|\theta^{(t)})+H(q)+\mathrm{KL}(q(z)\|p(x,z|\theta)) \end{aligned} $$ So maximizing the $\mathrm{ELBO}$ is equivalent to maximizing $Q$ since the entropy term $H$ is a constant. The Maximizing step is to find $\theta^{(t+1)}$ such that

$$ \left.\frac{\partial Q(\theta|\theta^{(t)})}{\partial \theta}\right|_{\theta^{(t+1)}}=0 $$

The expectation step is

$$ Q(\theta|\theta^{(t)})=\mathbb{E}_{q(z|x,\theta^{(t)})}\left(\ln p(x,z|\theta)\right) $$

To make the sentence "The increase of $\mathrm{ELBO}$ always results in an increase of the log marginal likelihood" solid, it would be nice if we can prove the following statement: For $\theta^{(t+1)}$ and $\theta^{t}$, it must be that $l(\theta^{(t+1)}|x)\ge l(\theta^{t}|x)$. Where can I find the proof for this? My thoughts

$$ \begin{aligned} &l(\theta^{(t+1)}|x)-l(\theta^{t}|x)\\ &= \int q(z|x,\theta^{(t)})\ln \frac{p(x,z|\theta^{(t+1)})}{p(x,z|\theta^t)}dz+\int q(z|x,\theta^{(t)})\ln\frac{p(z|x,\theta^{(t)})}{p(z|x,\theta^{(t+1)})}dz\\ \end{aligned} $$

The first term is obvious nonnegative. How do we deal with the second term?