I know in EM algorithm M-step, it tries to solve $$ \operatorname{argmax}_{\theta} Q(\theta, \theta^{ \text{old}}) = \sum_z p(z|x; \theta^{ \text{old}}) \log p(x,z; \theta) $$
I also understand the likelihood of data in pLSA model is (before introducing latent variable): $$ \sum_i \sum_j \log p(d_i, w_j)^{n(d_i, w_j)} $$
But why when EM algorithm is applied to pLSA model, the M-step is $$ \displaystyle\operatorname{argmax}_{p(w_j|z_k),\; p(z_k|d_i)} \sum_i \sum_j \sum_k p(z_k| d_i, w_j) \log p(d_i, w_j, z_k)^{n(d_i, w_j)} $$
In another words, why the expectation of the complete-data log likelihood is the above form?
I think I know how to derive it now, not through simply taking expectation (although correct, not intuitive enough, particularly, where is the sum of i, k coming from?) but by deriving from beginning just like what EM derives its ELBO.
In pLSA model, \begin{aligned} \text{data log likelihood} &= \log P(X;\theta) \\ &= \sum_i\sum_j \log P(d_i, w_j)^{n(d_i, w_j)} \\ &= \sum_i\sum_j \log \left[ \frac{P(d_i, w_j, z_k)}{P(z_k | d_i, w_j)} \right]^{n(d_i, w_j)} \\ &\text{introduce a distribution “q” and split into two parts:} \\ &= \sum_i\sum_j \log \frac{P(d_i, w_j, z_k)^{n(d_i, w_j)}}{q} - \sum_i\sum_j \log \frac{P(z_k | d_i, w_j)^{n(d_i, w_j)}}{q} \end{aligned}
Because $\sum_z q(z_k) \log P(X;\theta) = \log P(X;\theta) \sum_z q(z_k) = \log P(X;\theta) $, we can apply $\sum_z q(z_k)$ on the two sides without break the equality: \begin{aligned} \log P(X;\theta) &= \underbrace{ \sum_k\sum_i\sum_j q(z_k) \log \frac{P(d_i, w_j, z_k)^{n(d_i, w_j)}}{q(z_k)} }_{\text{evidence lowerbound (ELBO)}} \quad \underbrace{ - \sum_k\sum_i\sum_j q(z_k) \log \frac{P(z_k | d_i, w_j)^{n(d_i, w_j)}}{q(z_k)} }_{\text{KL-divergence} \ge 0} \\ \end{aligned}
Using the same idea as in the EM algorithm, let $q(z_k) = P(z_k | d_i, w_j)^{\text{old}}$ and we will only optimize ELBO: \begin{aligned} & \operatorname{argmax_\theta} \sum_k\sum_i\sum_j P(z_k | d_i, w_j)^{\text{old}} \log \frac{P(d_i, w_j, z_k)^{n(d_i, w_j)}}{P(z_k | d_i, w_j)^{\text{old}}} \\ &= \operatorname{argmax_\theta} \sum_k\sum_i\sum_j P(z_k | d_i, w_j)^{\text{old}} \log P(d_i, w_j, z_k)^{n(d_i, w_j)} \\ &- \sum_k\sum_i\sum_j P(z_k | d_i, w_j)^{\text{old}} \log P(z_k | d_i, w_j)^{\text{old}}\\ &\text{(second part is irrelevant, drop it)}\\ &= \operatorname{argmax_\theta} \sum_k\sum_i\sum_j P(z_k | d_i, w_j)^{\text{old}} \log P(d_i, w_j, z_k)^{n(d_i, w_j)} \\ &\text{(pull exponential part down to the front and rearrange sum orders)}\\ &= \operatorname{argmax_\theta} \sum_i\sum_j n(d_i, w_j) \sum_k P(z_k | d_i, w_j)^{\text{old}} \log P(d_i, w_j, z_k) \end{aligned} Now it is the expectation form.