Expectation Maximization Algorithm for Gaussian Mixture Model

Question

Expectation Maximization Algorithm for Gaussian Mixture Model

1k Views Asked by Bumbble Comm At 29 Mar 2026 - 4:18

Can we use the Expectation Maximization algorithm for estimation of Gaussian Mixture Model with full covariance matrices?

If yes then can you please give me a reference paper? So far all the machine learning books I have consulted describes the estimation of only the diagonal covariance matrices.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2015-03-07 02:44:16

Bumbble Comm On 07 Mar 2015 - 2:44

If you are using python, all you need to do is add a parameter, covariance_type='full'.

For example,

m = mixture.GMM(n_components=3,covariance_type='diag')

Works for me.

**Bumbble Comm** · Answer 2 · 2020-10-14 02:16:48

All you need is to know the formulas for the iterative estimates of your parameters $(\tau, \mu, \Sigma)$. Once you have it, coding the algorithm is just a simple loop.

Assuming your Gassian mixture is a mix of two gaussian. Let $\mathbf{x} = (\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n)$ be a sample of $n$ independent observations from a mixture of two multivariate normal distributions of dimension $d$, and let $\mathbf {z} =(z_{1},z_{2},\ldots ,z_{n})$ be the latent variables that determine the component from which the observation originates.

$X_{i}\mid (Z_{i}=1)\sim {\mathcal {N}}_{d}({\boldsymbol {\mu }}_{1},\Sigma _{1})$ and $X_{i}\mid (Z_{i}=2)\sim {\mathcal {N}}_{d}({\boldsymbol {\mu }}_{2},\Sigma _{2})$

where $\operatorname{P} (Z_i = 1 ) = \tau_1$ and $\operatorname {P} (Z_{i}=2)=\tau _{2}=1-\tau _{1}$

Initial Guess for unknown parameters:

Either give your own intuitive guess of $(\tau, \mu, \Sigma)$ or use K-mean algo to get a good estimate using sample mean, sample covariance, and $\tau = (1/2, 1/2)$

[E Step] Calculate expectation:

Calculate the expectation using previously calculated parameters $\theta^{(t)}$, it could be your initial guess $\theta^{(1)}$

$$ \begin{aligned} Q(\theta \mid \theta ^{(t)})&=\operatorname {E} _{\mathbf {Z} \mid \mathbf {X} ,\mathbf {\theta } ^{(t)}}[\log L(\theta ;\mathbf {x} ,\mathbf {Z} )]\\ &=\operatorname {E} _{\mathbf {Z} \mid \mathbf {X} ,\mathbf {\theta } ^{(t)}}[\log \prod _{i=1}^{n}L(\theta ;\mathbf {x} _{i},Z_{i})]\\ &=\operatorname {E} _{\mathbf {Z} \mid \mathbf {X} ,\mathbf {\theta } ^{(t)}}[\sum _{i=1}^{n}\log L(\theta ;\mathbf {x} _{i},Z_{i})]\\ &=\sum _{i=1}^{n}\operatorname {E} _{Z_{i}\mid \mathbf {X} ;\mathbf {\theta } ^{(t)}}[\log L(\theta ;\mathbf {x} _{i},Z_{i})]\\ &=\sum _{i=1}^{n}\sum _{j=1}^{2}P(Z_{i}=j\mid X_{i}=\mathbf {x} _{i};\theta ^{(t)})\log L(\theta _{j};\mathbf {x} _{i},j)\\ &=\sum _{i=1}^{n}\sum _{j=1}^{2}T_{j,i}^{(t)}{\big [}\log \tau _{j}-{\tfrac {1}{2}}\log |\Sigma _{j}|-{\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j})^{\top }\Sigma _{j}^{-1}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j})-{\tfrac {d}{2}}\log(2\pi ){\big ]}. \end{aligned} $$

where:

$$ T_{j,i}^{(t)}:=\operatorname {P} (Z_{i}=j\mid X_{i}=\mathbf {x} _{i};\theta ^{(t)})={\frac {\tau _{j}^{(t)}\ f(\mathbf {x} _{i};{\boldsymbol {\mu }}_{j}^{(t)},\Sigma _{j}^{(t)})}{\tau _{1}^{(t)}\ f(\mathbf {x} _{i};{\boldsymbol {\mu }}_{1}^{(t)},\Sigma _{1}^{(t)})+\tau _{2}^{(t)}\ f(\mathbf {x} _{i};{\boldsymbol {\mu }}_{2}^{(t)},\Sigma _{2}^{(t)})}} $$

[M Step] Maximize expectation and find the t+1 estimate

The linear form of $\tau_{i}$ and $\mu_{i}/\Sigma_i$ means we could maiximize $Q$ independently by maximizing $\tau_{i}$ and $\mu_{i}/\Sigma_{i}$ terms separately.

$\tau$ terms, with constrain $\tau_2 = 1 - \tau_1$:

$$ \begin{aligned} {\boldsymbol {\tau }}^{(t+1)} &= {\underset {\boldsymbol {\tau }}{\operatorname {arg\,max} }}\ Q(\theta \mid \theta ^{(t)})\\ &={\underset {\boldsymbol {\tau }}{\operatorname {arg\,max} }}\ \left\{\left[\sum _{i=1}^{n}T_{1,i}^{(t)}\right]\log \tau _{1}+\left[\sum _{i=1}^{n}T_{2,i}^{(t)}\right]\log \tau _{2}\right\}. \end{aligned} $$

By taking derivative of each $\tau$ and equal to zero, we get for formula for t+1 estimate of $\tau _{j}^{(t+1)}$:

$$ \tau _{j}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{j,i}^{(t)}}{\sum _{i=1}^{n}(T_{1,i}^{(t)}+T_{2,i}^{(t)})}}={\frac {1}{n}}\sum _{i=1}^{n}T_{j,i}^{(t)} $$

For the next estimates of $(\mu_1, \Sigma_1)$:

$$ \begin{aligned} ({\boldsymbol {\mu }}_{1}^{(t+1)},\Sigma_{1}^{(t+1)}) &= {\underset { {\boldsymbol{\mu}}_{1},\Sigma_{1} }{\operatorname {arg\,max} }} Q(\theta \mid \theta ^{(t)})\\ &= {\underset { {\boldsymbol{\mu}}_{1},\Sigma_{1} }{\operatorname {arg\,max} }} \sum _{i=1}^{n}T_{1,i}^{(t)}\left\{ -{\tfrac {1}{2}}\log |\Sigma _{1}|-{\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1})^{\top }\Sigma_{1}^{-1}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1}) \right\} \end{aligned} $$

To derive the formula for $\mu_{i}^{(t+1)}$, we'll gonna make use of this formula: $\mathbf{ \frac{\partial w^T A w}{\partial w} = 2Aw}$, where A is symmetric.

$$ \begin{aligned} 0 &= \frac{\partial{Q(\theta \mid \theta ^{(t)})}}{\partial{\mu_1}} \\ &= \sum_{i=1}^{n} -T_{1,i}^{(t)} \Sigma_{1}^{-1} (\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1}) \end{aligned} $$

$$ \begin{aligned} \boldsymbol{\mu}_1^{(t+1)} = \frac{\sum_{i=1}^n T_{1,i}^{(t)} \mathbf{x}_i}{\sum_{i=1}^n T_{1,i}^{(t)}} \\ \boldsymbol{\mu}_2^{(t+1)} = \frac{\sum_{i=1}^n T_{2,i}^{(t)} \mathbf{x}_i}{\sum_{i=1}^n T_{2,i}^{(t)}} \end{aligned} $$

To derive the formula for $\Sigma_{i}^{(t+1)}$, we'll gonna make use of the following equations:

Invariant under cyclic permutation $tr[ABC] = tr[CAB] = tr[BCA]$
Derivative of a trace: $\frac{\partial}{\partial{A}} tr[BA] = B^T$

$$ \begin{aligned} \frac{\partial}{\partial{a_{ij}}} tr[BA] &= \frac{\partial}{\partial{a_{ij}}} \sum_{k} \sum_{l} b_{kl}a_{lk} = b_{ji} \\ \frac{\partial}{\partial{A}} tr[BA] &= B^T \end{aligned} $$

Derivative of a log determinant: $\frac{\partial}{\partial{A}}\log{|A|} = A^{-T}$.

To establish the result, note that: $\frac{\partial}{\partial{a_{ij}}}\log{|A|} = \frac{1}{|A|} \frac{\partial}{\partial{a_{ij}}} |A|$, and recall the formula for the matrix inverse: $A^{-1} = \frac{1}{|A|} C^T$, therefore $A^{-T} = \frac{1}{|A|} C$. All we need to prove is $\frac{\partial}{\partial{a_{ij}}} |A| = C$, However, $|A| = \sum_j{(-1)^{(i+j)}a_{ij}M_{ij}}$,

$$ \frac{\partial}{\partial{a_{ij}}}|A| = \frac{\partial}{\partial{a_{ij}}}\left\{\sum_j{(-1)^{(i+j)}a_{ij}M_{ij}}\right\} = (-1)^{(i+j)} M_{ij} $$

Which is exactly the cofactor matrix $C$ at entry $(i,j)$. Therefore, $\frac{\partial}{\partial{a_{ij}}} |A| = C$.
Determinant of an inverse is the inverse of the determinant: $|A|^{-1} = |A^{-1}|$

Now we have all the formulas we need to derive the $\Sigma_1^{(t+1)}$.

Because $\frac{\partial}{\partial{A}} x^tAx = \frac{\partial}{\partial{A}} tr[x^tAx] = \frac{\partial}{\partial{A}} tr[xx^tA] = xx^T$, we get:

$$ \begin{aligned} 0 &= \frac{\partial{Q(\theta \mid \theta ^{(t)})}}{\partial{\Sigma_1^{-1}}} \\ &=\sum _{i=1}^{n}\sum _{j=1}^{2}T_{j,i}^{(t)}{\bigg [} \frac{\partial}{\partial{\Sigma_1^{-1}}} {\tfrac {1}{2}}\log |\Sigma _{j}^{-1}| - \frac{\partial}{\partial{\Sigma_1^{-1}}} tr {\big [} {\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j})^{\top }\Sigma _{j}^{-1}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j}) {\big ]} {\bigg ]} \\ &=\sum _{i=1}^{n} T_{1,i}^{(t)} \tfrac{1}{2} \Sigma_1 - \sum _{i=1}^{n} T_{1,i}^{(t)} {\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1})^{\top } \end{aligned} $$

Formula for estimating t+1 $(\Sigma_1^{(t+1)}, \Sigma_2^{(t+1)})$

$$ \begin{aligned} \Sigma_1^{(t+1)} &= \frac{\sum_{i=1}^n T_{1,i}^{(t)} (\mathbf{x}_i - \boldsymbol{\mu}_1^{(t+1)}) (\mathbf{x}_i - \boldsymbol{\mu}_1^{(t+1)})^\top }{\sum_{i=1}^n T_{1,i}^{(t)}} \\ \Sigma _{2}^{(t+1)} &={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})^{\top }}{\sum _{i=1}^{n}T_{2,i}^{(t)}}} \end{aligned} $$

[Stop Step]

Stop the iteration when $\|Q(\theta_{m+1}|\hat{\theta}_m,\boldsymbol{x}) - Q(\theta_m|\hat{\theta}_m,\boldsymbol{x}) \| \le \epsilon$

Notes:

A more detail explanation can be found in here: https://youtu.be/GEdAP680w5Y

Pseudo code:

let tau = [1/2, 1/2]
let kmean = Kmean(x1, x2, ..., xn) 
let mu = [kmean.sample[0].sampleMean, kmean.sample[1].sampleMean]
let sigma = [kmean.sample[0].sampleVar, kmean.sample[1].sampleVar]

for ( |Q[n+1] - Q[n]| < epsilon) {
  mu = Calculate mu[n+1] with the mu formula
  sigma = Calculate sigma[n+1] with the Sigma formula
}

Expectation Maximization Algorithm for Gaussian Mixture Model

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in REFERENCE-REQUEST

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions