I am studying different machine learning algorithms currently and can't quite see the difference between Gaussian Mixture Modelling and Maximum Likelihood Estimation.
Is GMM only a special case of MLE using the gaussian distribution as an estimation function?
GMM is a specific way to model data. It does so by fitting, unsurprisingly, a Gaussian mixture to the data. Essentially, you learn or fit a parameter set $\Theta$ that describes the data probabilistically (or statistically). Generally, assuming the data are in dimension $n$, $\Theta$ includes $k$ weights (i.e. $\vec{w}\in\mathbb{R}^k$, each component giving the weight of each mixture), as well as $\mu_i\in\mathbb{R}^n$ and $\sigma_i\in\mathbb{R}^{n\times n}$ for $i\in[1,k]\subset\mathbb{Z_+}$. Then you can do things like generate new data that looks like the input, classify/cluster new data, etc...
But how to get $\Theta$? The answer is to use expectation maximization (EM). Essentially, the idea is to iteratively alternate between:
(1) Given $\Theta_t$, estimate the likelihood of the data $X$ (with tags $Y$) via $\mathcal{L}(\Theta|X)=P(X|\Theta)$ and the expected log-likelihood $\mathcal{E_t}(\Theta)=\mathbb{E}_{Y|\Theta_t}[\ln\mathcal{L}(\Theta|X,Y)]$.
(2) Find a better model by maximizing the parameters as $\Theta_{t+1}=\arg\max_\Theta \mathcal{E_t}(\Theta)$ and goto (1).
Notice that this algorithm is very general, and can be used for fitting many different models (e.g. GMMs). But it itself is not a model.
Summary: GMM is a specific model; EM is an algorithm.