I think I understand the gits of Expectation-Maximization algorithm and its altering nature, but I am puzzled by the notation. Lets see the following examples:
- in Stanford notes , the E-step is simply stated as posterior probability of latent variable $z$:
$$Q_i(z^{(i)}) := p(z^{(i)}|z^{(i)};\theta)$$
where $z^{(i)}$ is latent variable sample, $x^{(i)}$ is observed data, $\theta$ are the parameters maximized in M-step.
- in Original paper from 1977 the E-step looks as follows:
$$t^{(p)} = E\big[ t(x)|y,\Theta^{(p)} \big]$$
where I believe the $y$ is observe variable, $x$ is latent variable, $\Theta^{(p)}$ are model parameters used in M-step. To me, this looks like:
$$E_{x|y,\theta}\big[ p(x|y,\Theta)\big]$$
where the $x,y,\Theta$ is the same as in point 2.
I appologize for introducing 2 notations, one in point 1. another in point 2. but I am trying to keep it consistent with the linked papers.
Question
The point of E-step is to obtain such values of latent variables, that they maximize the observation of complete data, given the current model parameters $\theta$ or $\Theta^{(p)}$. Then my question is, how do I formally get these values from the presented E-steps ?
I mean, what/where do I calculate in $Q_i(z^{(i)}) := p(z^{(i)}|z^{(i)};\theta)$ ? Because it is just a definition of posterior distribution, there is no maximization, no operation to be done.
The second one $E_{x|y,\theta}\big[ p(x|y,\Theta)\big]$ is a bit more intuitive, because I am calculating an expectation of distributions (I think $t(x)$ is distribution of latent variable $x$). That means, I am looking for such values of $x$, that are expected -> gives maximum probability of realizing/happening.
Can someone formally show (and explain in layman's terms), how to obtain the values of the latent variables from the equations of E-step ?