Expectation Maximization Algorithm with latent variable

842 Views Asked by At

In chapter 9 of Pattern Recognition And Machine Learning book an alternative version of Expectation Maximization Algorithm(EM) is introduced as follows:

$X$: observed data

$Z$: all latent variables

$\theta$: is a set of all model paramters

The log likelihood function is shown as: $\ln p(X|\theta) = \ln\{ \sum_Z p(X,Z|\theta)\}$

We are not given the complete data set $\{X,Z\}$, but only the incomplete data $X$. The value of latent variables in $Z$ is given by $p(Z|X,\theta)$. Because we cannot use the complete-data log likelihood, we consider instead its expected value under the posterior distribution of the latent variable, which corresponds to the E step of EM algorithm. In subsequent M step, we maximize the expectation. If current estimate for the parameter is denoted as $\theta ^ {old}$, then the pair of successive E and M step give rise to $\theta ^ {new}$.

In the E step, we use $\theta ^{old}$ to find posterior distribution of $Z$ by $p(Z|X,\theta^{old})$. We then use this posterior distribution to find the expectation of the complete-data log likelihood evaluated for some general parameter value $\theta$. This expectation, denoted $\mathcal{Q}(\theta, \theta ^{old})$, is given by

$\mathcal{Q}(\theta, \theta ^{old}) = \sum_z p(Z|X,\theta ^{old}) \ln{p (X,Z|\theta)} \quad\quad\quad (9.30)$

In the M step we determined revised parameter estimate $\theta ^{new}$ by maximizing function:

$\theta ^{new} = \text{arg}\max \limits_{\theta} \mathcal{Q}(\theta,\theta^{old}) \quad\quad\quad (9.31)$

I cannot follow how 9.30 is written.

Based on which fact $\mathcal{Q}(\theta, \theta ^{old})$ is written as it is shown in 9.30? Why in the equation 9.30, it used $\ln{p (X,Z|\theta)}$ instead of $p (X,Z|\theta)$?

Could anyone help to understand this please? Thanks in advance.

1

There are 1 best solutions below

6
On BEST ANSWER

$\mathcal{Q}(\theta, \theta^{old})$ is defined to be the expectation of the complete data log likelihood evaluated for some general parameter value $\theta$.

\begin{align}\mathcal{Q}(\theta , \theta^{old})&=\mathbb{E}_{Z|X,\theta^{old}} \ln p(X,Z|\theta)\\ &=\sum_Z p(Z|X, \theta^{old})\ln p(X,Z|\theta)\end{align}