I coded up an implementation of the EM-algorithm for Gaussian mixtures. In the E-step I compute, for each row in the data matrix, the probability $p_i$ that it has been drawn from the component $i \in \{1, \dots K\}$. Specifically, for a given row $X_{j, \cdot}$, I compute the densities $d_1, d_2, ..., d_K$ given by each of the mixture components $1, 2, \dots, K$. Then I compute $p_i = d_i / Z$, where $Z=d_1 + \dots + d_K$.
This procedure works for simple examples, but especially for larger number of variables, the densities $d_i$ are often put to exactly zero, because I only have a finite number of floating points on my computer. The natural solution would be to use $\log d_i$, but I do not see how to write $p_i = d_i / Z$ in terms of $\log d_i$. For example, for $K=2$ we have
$$ \log \left [ \frac{d_1}{d_1 + d_2} \right ] = \log \left [ \frac{d_1}{d_2} \right ] - \log \left [ \frac{d_1 + d_2}{d_2} \right ] $$
and now I am stuck with the second term on the RHS.
I'd assume there must be a solution to this problem, since I guess any EM implementation runs into this problem.
Given a datapoint $\mathbf x$, you compute the log-densities, \begin{align} \log d_i & := \log\left(\pi_i \times \mathcal N\left(\mathbf x \mid \mathbf m_i, \mathbf S_i \right) \right) \\ &= \log \pi_i - \frac 1 2 (\mathbf x - \mathbf m_i)\mathbf S_i^{-1} (\mathbf x - \mathbf m_i) - \frac 1 2 \log\det\mathbf S_i - \frac D 2 \log (2\pi)\end{align} for each $i\in \{ 1, \dots, K \}$, where:
are the cluster parameters computed in your previous M-step, and $D$ is the number of dimensions. (If you use my formula for $\log d_i$, your calculation should be numerically stable.)
Next, you identify the cluster $i_\star$ that gives the highest log-density for the datapoint $\mathbf x$: $$ i_\star := {\rm argmax}_{i \in \{1, \dots, K\}} \log d_i,$$
and you define the intermediate quantities $$ \gamma_i := \exp\left(\log d_i - \log d_{i_\star} \right)$$ for each $i \in \{1, \dots, K\}$.
Notice that:
You also compute $$ Z := \sum_{i=1}^K \gamma_i,$$ which is guaranteed to be a floating-point number between $1$ and $K$.
Finally, the cluster assignment probabilities are given by $$ p_i = \frac{\gamma_i}{Z}$$ for each $i \in \{1, \dots, K \}$.
This procedure is numerically stable. There is no risk of incurring errors from handling small numbers (or large numbers, for that matter).