How to find complete log likelihood for mixture of PPCA

88 Views Asked by At

In Appendix C of a paper by Michael E. Tipping and Christopher M. Bishop about mixture models for probabilistic PCA, the probability of a single data vector $\mathbf{t}$ is expressed as a mixture of PCA models (equation 69):

$$ p(\mathbf{t}) = \sum_{i=1}^M\pi_i p(\mathbf{t}|i) $$

where $\pi$ is the mixing proportion and $p(\mathbf{t}|i)$ is a single probabilistic PCA model.

The model underlying the probabilistic PCA method is (equation 2)

$$ \mathbf{t} = \mathbf{Wx} + \boldsymbol\mu + \boldsymbol\epsilon. $$ Where $\mathbf{x}$ is a latent variable. By introducing a new set of variables $z_{ni}$ "labelling which model is responsible for generating each data point $\mathbf{t}_n$", Bishop formulates the complete log likelihood as (equation 70):

$$ \mathcal{L}_C = \sum_{n=1}^N\sum_{i=1}^Mz_{ni}ln\{\pi_ip(\mathbf{t}_n, \mathbf{x}_{ni})\}. $$ I would like to understand how he derives this expression as he doesn't provide a solution himself. How is this expression for the complete log likelihood found?

1

There are 1 best solutions below

0
On

Let's concentrate for the time being on the $n^{\rm th}$ datapoint, $\mathbf t_n$. Suppose this datapoint is generated from the $i_n^{\rm th}$ model. Then $$ z_{ni} = \begin{cases} 1 & {\rm if \ } i = i_n \\ 0 & {\rm otherwise}\end{cases}.$$Thus we have $$\sum_{i=1}^M z_{ni} \ln \left(\pi_i p(\mathbf t_n ,\mathbf x_{ni}) \right) = \ln \left( \pi_{i_n} p(\mathbf t_n , \mathbf x_{n{i_n}})\right).$$ The expression on the right-hand side is log-likelihood for the $n$th datapoint. To spell it out:

  • $\pi_{i_n}$ is the probability that the $n^{\rm th}$ datapoint is generated by the $i_n^{\rm th}$ model.
  • $p(\mathbf t_n, \mathbf x_{ni_n})$ is the probability of encountering this particular latent vector $\mathbf x_{ni_n}$ and this particular visible vector $\mathbf t_n$ for the $n^{\rm th}$ datapoint, given that this datapoint is generated from the the $i_n^{\rm th}$ model. [In fact, $$p(\mathbf t_n, \mathbf x_{ni_n}) = \mathcal N(\mathbf x_{ni_n} | \mathbf 0, \mathbf I) \times \mathcal N(\mathbf t_n - \mathbf W \mathbf x_{ni_n} - \mathbf \mu_n |\mathbf 0, \sigma_n^2 \mathbf I),$$ assuming that $\mathbf \epsilon_{n} \sim \mathcal N(\mathbf 0, \sigma_n^2 \mathbf I)$.]

Since the datapoints are generated independently, the log-likelihood for the entire dataset is a sum over the log-likelihoods for the individual datapoints, giving the desired result.