Compute component probabilities in EM-algorithm with log densities?

69 Views Asked by At

I coded up an implementation of the EM-algorithm for Gaussian mixtures. In the E-step I compute, for each row in the data matrix, the probability $p_i$ that it has been drawn from the component $i \in \{1, \dots K\}$. Specifically, for a given row $X_{j, \cdot}$, I compute the densities $d_1, d_2, ..., d_K$ given by each of the mixture components $1, 2, \dots, K$. Then I compute $p_i = d_i / Z$, where $Z=d_1 + \dots + d_K$.

This procedure works for simple examples, but especially for larger number of variables, the densities $d_i$ are often put to exactly zero, because I only have a finite number of floating points on my computer. The natural solution would be to use $\log d_i$, but I do not see how to write $p_i = d_i / Z$ in terms of $\log d_i$. For example, for $K=2$ we have

$$ \log \left [ \frac{d_1}{d_1 + d_2} \right ] = \log \left [ \frac{d_1}{d_2} \right ] - \log \left [ \frac{d_1 + d_2}{d_2} \right ] $$

and now I am stuck with the second term on the RHS.

I'd assume there must be a solution to this problem, since I guess any EM implementation runs into this problem.

1

There are 1 best solutions below

6
On BEST ANSWER

Given a datapoint $\mathbf x$, you compute the log-densities, \begin{align} \log d_i & := \log\left(\pi_i \times \mathcal N\left(\mathbf x \mid \mathbf m_i, \mathbf S_i \right) \right) \\ &= \log \pi_i - \frac 1 2 (\mathbf x - \mathbf m_i)\mathbf S_i^{-1} (\mathbf x - \mathbf m_i) - \frac 1 2 \log\det\mathbf S_i - \frac D 2 \log (2\pi)\end{align} for each $i\in \{ 1, \dots, K \}$, where:

  • $\pi_i$, the size of the $i^{\rm th}$ cluster as a proportion of the population
  • $\mathbf m_i$, the mean of the $i^{\rm th}$ cluster
  • $\mathbf S_i$, the variance of the $i^{\rm th}$ cluster

are the cluster parameters computed in your previous M-step, and $D$ is the number of dimensions. (If you use my formula for $\log d_i$, your calculation should be numerically stable.)

Next, you identify the cluster $i_\star$ that gives the highest log-density for the datapoint $\mathbf x$: $$ i_\star := {\rm argmax}_{i \in \{1, \dots, K\}} \log d_i,$$

and you define the intermediate quantities $$ \gamma_i := \exp\left(\log d_i - \log d_{i_\star} \right)$$ for each $i \in \{1, \dots, K\}$.

Notice that:

  • Each of the $\gamma_i$'s is a floating-point number between $0$ and $1$.
  • One of the $\gamma_i$'s (namely, $\gamma_{i_\star}$) is exactly equal to one.

You also compute $$ Z := \sum_{i=1}^K \gamma_i,$$ which is guaranteed to be a floating-point number between $1$ and $K$.

Finally, the cluster assignment probabilities are given by $$ p_i = \frac{\gamma_i}{Z}$$ for each $i \in \{1, \dots, K \}$.

This procedure is numerically stable. There is no risk of incurring errors from handling small numbers (or large numbers, for that matter).