I am trying to understand the EM algorithm for Hidden Markov Model (HMM). However, I don't see how the complete log-likelihood is calculated.
Let $x_{1:T}$ be a sequence of observed random variables $X_1,...,X_T$ and $z_{1:T}$ be the sequence of hidden discrete valued random variables $Z_1, Z_2,...,Z_T$.
The joint distribution has the following form: $$p(z_{1:T}, x_{1:T}) = p(x_{1:T}|z_{1:T})p(z_{1:T}) = \prod_{t=1}^{T}p(x_t|z_i)p(z_1)\prod_{t=2}^{T}p(z_{t}|z_{t-1})$$
I don't see how one can compute either one of this quantities. There is a lot of stuff discussed in the HMM context like $p(z_t = i|x_{1:t})$ - forward algorithm, $p(z_{t}=i|x_{1:T})$-forward/backward algorithm. But I don't know how this can be used to calculate the EM.
The best-known version of EM algorithm applied to a Hidden Markov Model is the Baum-Welch algorithm. The Wikipedia article to which I have just given a link seems to me to contain a reasonably accessible exposition of it, although I think some of the notation isn't particularly well-chosen (I'd prefer the simpler $\ b_{ji}\ $ instead of the unneccessarily verbose $\ b_j\big(y_i\big)\ $, for instance).
In your notation, starting from your current estimates for the initial state probabilities, $\ \pi_i=p\big(Z_1=i\big)\ $ for all $\ i\ $, transition matrix entries, $\ a_{ij}=p\big(Z_{t+1}=j\,\big|\,Z_t=i\big)\ $ for all $\ i\ $ and $\ j\ $, and observation output probabilities, $\ b_{ik}=p\big(X_t=k\,\big|\,Z_t=i\big)\ $ for all $\ i\ $ and $\ k\ $, you need to calculate:
I think the nicest expositions of the Baum-Welch algorithm are those which express the calculations in terms of matrix multiplication. If $\ n\ $ is the number of hidden states, define the following matrices and column vectors:
In terms of these matrices and column vectors, the quantities you need to calculate are given by: \begin{align} &p\big(X_{1:T}=x_{1:T}\big)=\pi^TC\big(x_1\big)C\big(x_2\big)\dots C\big(x_T\big)\mathbf{1}\ ,\\ &p\big(Z_t=i,X_{1:T}=x_{1:T}\big)=\\ &\hspace{7em}\cases{\pi_i\epsilon(i)^TC\big(x_1\big)C\big(x_2\big)\dots C\big(x_T\big)\mathbf{1}&if $\ t=1$\\ \pi^T\prod_\limits{s=1}^{t-1}C\big(x_s\big)\epsilon(i)\epsilon(i)^T\prod_\limits{s=t}^TC\big(x_s\big)\mathbf{1}&if $\ 1<t\le T$ }\\ &p\big(Z_t=i,Z_{t+1}=j,X_{1:T}=x_{1:T}\big)=\\ &\hspace{3em}\cases{\pi_ia_{ij}b_{ix_1}\epsilon(j)^TC\big(x_2\big)C\big(x_3\big)\dots C\big(x_T\big)\mathbf{1}&if $\ t=1$\\ \pi^T\prod_\limits{s=1}^{t-1}C\big(x_s\big)\epsilon(i)a_{ij}b_{ix_t}\epsilon(j)^T\prod_\limits{s=t+1}^TC\big(x_s\big)\mathbf{1}&if $\ 1<t\le T-1\ $. }\\ &p\big(Z_t=i, X_t=k,X_{1:T}=x_{1:T}\big)=\\ &\hspace{11em}\cases{p\big(Z_t=i,X_{1:T}=x_{1:T}\big)&if $\ k= x_t$\\ 0&otherwise} \end{align} From these quantities, you can now obtain \begin{align} &p\big(Z_t=i\,\big|\,X_{1:T}=x_{1:T}\big)\ ,\\ &p\big(Z_t=i, Z_{t+1}=j\,\big|\,X_{1:T}=x_{1:T}\big)\ \text{, and}\\ &p\big(Z_t=i, X_t=k\,\big|\,X_{1:T}=x_{1:T}\big)\ , \end{align} and calculate the conditional expectations, given the observations, of: