Expectation Maximization Algorithm for Gaussian Mixture Model

1k Views Asked by At

Can we use the Expectation Maximization algorithm for estimation of Gaussian Mixture Model with full covariance matrices?

If yes then can you please give me a reference paper? So far all the machine learning books I have consulted describes the estimation of only the diagonal covariance matrices.

2

There are 2 best solutions below

0
On

If you are using python, all you need to do is add a parameter, covariance_type='full'.

For example,

m = mixture.GMM(n_components=3,covariance_type='diag')

Works for me.

0
On

All you need is to know the formulas for the iterative estimates of your parameters $(\tau, \mu, \Sigma)$. Once you have it, coding the algorithm is just a simple loop.

Assuming your Gassian mixture is a mix of two gaussian. Let $\mathbf{x} = (\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n)$ be a sample of $n$ independent observations from a mixture of two multivariate normal distributions of dimension $d$, and let $\mathbf {z} =(z_{1},z_{2},\ldots ,z_{n})$ be the latent variables that determine the component from which the observation originates.

$X_{i}\mid (Z_{i}=1)\sim {\mathcal {N}}_{d}({\boldsymbol {\mu }}_{1},\Sigma _{1})$ and $X_{i}\mid (Z_{i}=2)\sim {\mathcal {N}}_{d}({\boldsymbol {\mu }}_{2},\Sigma _{2})$

where $\operatorname{P} (Z_i = 1 ) = \tau_1$ and $\operatorname {P} (Z_{i}=2)=\tau _{2}=1-\tau _{1}$

Initial Guess for unknown parameters:

Either give your own intuitive guess of $(\tau, \mu, \Sigma)$ or use K-mean algo to get a good estimate using sample mean, sample covariance, and $\tau = (1/2, 1/2)$

[E Step] Calculate expectation:

Calculate the expectation using previously calculated parameters $\theta^{(t)}$, it could be your initial guess $\theta^{(1)}$

$$ \begin{aligned} Q(\theta \mid \theta ^{(t)})&=\operatorname {E} _{\mathbf {Z} \mid \mathbf {X} ,\mathbf {\theta } ^{(t)}}[\log L(\theta ;\mathbf {x} ,\mathbf {Z} )]\\ &=\operatorname {E} _{\mathbf {Z} \mid \mathbf {X} ,\mathbf {\theta } ^{(t)}}[\log \prod _{i=1}^{n}L(\theta ;\mathbf {x} _{i},Z_{i})]\\ &=\operatorname {E} _{\mathbf {Z} \mid \mathbf {X} ,\mathbf {\theta } ^{(t)}}[\sum _{i=1}^{n}\log L(\theta ;\mathbf {x} _{i},Z_{i})]\\ &=\sum _{i=1}^{n}\operatorname {E} _{Z_{i}\mid \mathbf {X} ;\mathbf {\theta } ^{(t)}}[\log L(\theta ;\mathbf {x} _{i},Z_{i})]\\ &=\sum _{i=1}^{n}\sum _{j=1}^{2}P(Z_{i}=j\mid X_{i}=\mathbf {x} _{i};\theta ^{(t)})\log L(\theta _{j};\mathbf {x} _{i},j)\\ &=\sum _{i=1}^{n}\sum _{j=1}^{2}T_{j,i}^{(t)}{\big [}\log \tau _{j}-{\tfrac {1}{2}}\log |\Sigma _{j}|-{\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j})^{\top }\Sigma _{j}^{-1}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j})-{\tfrac {d}{2}}\log(2\pi ){\big ]}. \end{aligned} $$

where:

$$ T_{j,i}^{(t)}:=\operatorname {P} (Z_{i}=j\mid X_{i}=\mathbf {x} _{i};\theta ^{(t)})={\frac {\tau _{j}^{(t)}\ f(\mathbf {x} _{i};{\boldsymbol {\mu }}_{j}^{(t)},\Sigma _{j}^{(t)})}{\tau _{1}^{(t)}\ f(\mathbf {x} _{i};{\boldsymbol {\mu }}_{1}^{(t)},\Sigma _{1}^{(t)})+\tau _{2}^{(t)}\ f(\mathbf {x} _{i};{\boldsymbol {\mu }}_{2}^{(t)},\Sigma _{2}^{(t)})}} $$

[M Step] Maximize expectation and find the t+1 estimate

The linear form of $\tau_{i}$ and $\mu_{i}/\Sigma_i$ means we could maiximize $Q$ independently by maximizing $\tau_{i}$ and $\mu_{i}/\Sigma_{i}$ terms separately.

$\tau$ terms, with constrain $\tau_2 = 1 - \tau_1$:

$$ \begin{aligned} {\boldsymbol {\tau }}^{(t+1)} &= {\underset {\boldsymbol {\tau }}{\operatorname {arg\,max} }}\ Q(\theta \mid \theta ^{(t)})\\ &={\underset {\boldsymbol {\tau }}{\operatorname {arg\,max} }}\ \left\{\left[\sum _{i=1}^{n}T_{1,i}^{(t)}\right]\log \tau _{1}+\left[\sum _{i=1}^{n}T_{2,i}^{(t)}\right]\log \tau _{2}\right\}. \end{aligned} $$

By taking derivative of each $\tau$ and equal to zero, we get for formula for t+1 estimate of $\tau _{j}^{(t+1)}$:

$$ \tau _{j}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{j,i}^{(t)}}{\sum _{i=1}^{n}(T_{1,i}^{(t)}+T_{2,i}^{(t)})}}={\frac {1}{n}}\sum _{i=1}^{n}T_{j,i}^{(t)} $$

For the next estimates of $(\mu_1, \Sigma_1)$:

$$ \begin{aligned} ({\boldsymbol {\mu }}_{1}^{(t+1)},\Sigma_{1}^{(t+1)}) &= {\underset { {\boldsymbol{\mu}}_{1},\Sigma_{1} }{\operatorname {arg\,max} }} Q(\theta \mid \theta ^{(t)})\\ &= {\underset { {\boldsymbol{\mu}}_{1},\Sigma_{1} }{\operatorname {arg\,max} }} \sum _{i=1}^{n}T_{1,i}^{(t)}\left\{ -{\tfrac {1}{2}}\log |\Sigma _{1}|-{\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1})^{\top }\Sigma_{1}^{-1}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1}) \right\} \end{aligned} $$

To derive the formula for $\mu_{i}^{(t+1)}$, we'll gonna make use of this formula: $\mathbf{ \frac{\partial w^T A w}{\partial w} = 2Aw}$, where A is symmetric.

$$ \begin{aligned} 0 &= \frac{\partial{Q(\theta \mid \theta ^{(t)})}}{\partial{\mu_1}} \\ &= \sum_{i=1}^{n} -T_{1,i}^{(t)} \Sigma_{1}^{-1} (\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1}) \end{aligned} $$

$$ \begin{aligned} \boldsymbol{\mu}_1^{(t+1)} = \frac{\sum_{i=1}^n T_{1,i}^{(t)} \mathbf{x}_i}{\sum_{i=1}^n T_{1,i}^{(t)}} \\ \boldsymbol{\mu}_2^{(t+1)} = \frac{\sum_{i=1}^n T_{2,i}^{(t)} \mathbf{x}_i}{\sum_{i=1}^n T_{2,i}^{(t)}} \end{aligned} $$

To derive the formula for $\Sigma_{i}^{(t+1)}$, we'll gonna make use of the following equations:

  • Invariant under cyclic permutation $tr[ABC] = tr[CAB] = tr[BCA]$

  • Derivative of a trace: $\frac{\partial}{\partial{A}} tr[BA] = B^T$

$$ \begin{aligned} \frac{\partial}{\partial{a_{ij}}} tr[BA] &= \frac{\partial}{\partial{a_{ij}}} \sum_{k} \sum_{l} b_{kl}a_{lk} = b_{ji} \\ \frac{\partial}{\partial{A}} tr[BA] &= B^T \end{aligned} $$

  • Derivative of a log determinant: $\frac{\partial}{\partial{A}}\log{|A|} = A^{-T}$.

    To establish the result, note that: $\frac{\partial}{\partial{a_{ij}}}\log{|A|} = \frac{1}{|A|} \frac{\partial}{\partial{a_{ij}}} |A|$, and recall the formula for the matrix inverse: $A^{-1} = \frac{1}{|A|} C^T$, therefore $A^{-T} = \frac{1}{|A|} C$. All we need to prove is $\frac{\partial}{\partial{a_{ij}}} |A| = C$, However, $|A| = \sum_j{(-1)^{(i+j)}a_{ij}M_{ij}}$,

    $$ \frac{\partial}{\partial{a_{ij}}}|A| = \frac{\partial}{\partial{a_{ij}}}\left\{\sum_j{(-1)^{(i+j)}a_{ij}M_{ij}}\right\} = (-1)^{(i+j)} M_{ij} $$

    Which is exactly the cofactor matrix $C$ at entry $(i,j)$. Therefore, $\frac{\partial}{\partial{a_{ij}}} |A| = C$.

  • Determinant of an inverse is the inverse of the determinant: $|A|^{-1} = |A^{-1}|$

Now we have all the formulas we need to derive the $\Sigma_1^{(t+1)}$.

Because $\frac{\partial}{\partial{A}} x^tAx = \frac{\partial}{\partial{A}} tr[x^tAx] = \frac{\partial}{\partial{A}} tr[xx^tA] = xx^T$, we get:

$$ \begin{aligned} 0 &= \frac{\partial{Q(\theta \mid \theta ^{(t)})}}{\partial{\Sigma_1^{-1}}} \\ &=\sum _{i=1}^{n}\sum _{j=1}^{2}T_{j,i}^{(t)}{\bigg [} \frac{\partial}{\partial{\Sigma_1^{-1}}} {\tfrac {1}{2}}\log |\Sigma _{j}^{-1}| - \frac{\partial}{\partial{\Sigma_1^{-1}}} tr {\big [} {\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j})^{\top }\Sigma _{j}^{-1}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{j}) {\big ]} {\bigg ]} \\ &=\sum _{i=1}^{n} T_{1,i}^{(t)} \tfrac{1}{2} \Sigma_1 - \sum _{i=1}^{n} T_{1,i}^{(t)} {\tfrac {1}{2}}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{1})^{\top } \end{aligned} $$

Formula for estimating t+1 $(\Sigma_1^{(t+1)}, \Sigma_2^{(t+1)})$

$$ \begin{aligned} \Sigma_1^{(t+1)} &= \frac{\sum_{i=1}^n T_{1,i}^{(t)} (\mathbf{x}_i - \boldsymbol{\mu}_1^{(t+1)}) (\mathbf{x}_i - \boldsymbol{\mu}_1^{(t+1)})^\top }{\sum_{i=1}^n T_{1,i}^{(t)}} \\ \Sigma _{2}^{(t+1)} &={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})^{\top }}{\sum _{i=1}^{n}T_{2,i}^{(t)}}} \end{aligned} $$

[Stop Step]

Stop the iteration when $\|Q(\theta_{m+1}|\hat{\theta}_m,\boldsymbol{x}) - Q(\theta_m|\hat{\theta}_m,\boldsymbol{x}) \| \le \epsilon$

Notes:

A more detail explanation can be found in here: https://youtu.be/GEdAP680w5Y

Pseudo code:

let tau = [1/2, 1/2]
let kmean = Kmean(x1, x2, ..., xn) 
let mu = [kmean.sample[0].sampleMean, kmean.sample[1].sampleMean]
let sigma = [kmean.sample[0].sampleVar, kmean.sample[1].sampleVar]

for ( |Q[n+1] - Q[n]| < epsilon) {
  mu = Calculate mu[n+1] with the mu formula
  sigma = Calculate sigma[n+1] with the Sigma formula
}