Currently studying EM algorithm and have been through a few articles, they all say it is for missing data. I believe there is some implication in the term "missing data".
I wonder if EM is designed for only missing data. Can somebody let me have a clear definition on EM algorithm?
The EM algorithm is designed for missing data but the term "missing data" can also include the concept of a "latent variable" which can be interpreted as an observation which is always missing. The goal of the EM algorithm is to maximize the likelihood of the observed data $y_1, \ldots, y_n$ given the parameter values of your probability model. This means that we need to "integrate" over the unobservable data components $z_1, \ldots, z_n$ and this integration is typically computationally challenging. Let $p(y,z|\theta)$ specify the probability model where $y$ is the observable and $z$ is the latent discrete variable. The goal is to minimize $\ell_n(\theta) = -(1/n) \sum_{i=1}^n \log \sum_z p(y_i,z|\theta)$. The EM algorithm works by first guess a value for $\theta$, call that $\theta_t$. Then we do the E-step which is to compute the expectation $\ell_{n,t}(\theta) = -(1/n) \sum_{i=1}^n \sum_z \left(\log p(y_i,z | \theta) \right) p(z|y_i, \theta_t)$. Then we do the M-step which is to fully minimize $\ell_{n,t}$. Then we increment $t$ with $t = t + 1$ and go back to the E-Step. If we just partially minimize $\ell_{n,t}$ using (for example) a gradient descent step then this is called a GEM (Generalized EM) algorithm. Adaptive versions where the expectation is only estimated based upon sampling the density $p(z|y_i,\theta_)$ are called MCEM (Monte Carlo EM) or SAEM (Stochastic Approximation EM).