Hidden Markov Model: Expectation maximization

150 Views Asked by At

I am trying to understand the EM algorithm for Hidden Markov Model (HMM). However, I don't see how the complete log-likelihood is calculated.

Let $x_{1:T}$ be a sequence of observed random variables $X_1,...,X_T$ and $z_{1:T}$ be the sequence of hidden discrete valued random variables $Z_1, Z_2,...,Z_T$.

The joint distribution has the following form: $$p(z_{1:T}, x_{1:T}) = p(x_{1:T}|z_{1:T})p(z_{1:T}) = \prod_{t=1}^{T}p(x_t|z_i)p(z_1)\prod_{t=2}^{T}p(z_{t}|z_{t-1})$$

I don't see how one can compute either one of this quantities. There is a lot of stuff discussed in the HMM context like $p(z_t = i|x_{1:t})$ - forward algorithm, $p(z_{t}=i|x_{1:T})$-forward/backward algorithm. But I don't know how this can be used to calculate the EM.

1

There are 1 best solutions below

0
On

The best-known version of EM algorithm applied to a Hidden Markov Model is the Baum-Welch algorithm. The Wikipedia article to which I have just given a link seems to me to contain a reasonably accessible exposition of it, although I think some of the notation isn't particularly well-chosen (I'd prefer the simpler $\ b_{ji}\ $ instead of the unneccessarily verbose $\ b_j\big(y_i\big)\ $, for instance).

In your notation, starting from your current estimates for the initial state probabilities, $\ \pi_i=p\big(Z_1=i\big)\ $ for all $\ i\ $, transition matrix entries, $\ a_{ij}=p\big(Z_{t+1}=j\,\big|\,Z_t=i\big)\ $ for all $\ i\ $ and $\ j\ $, and observation output probabilities, $\ b_{ik}=p\big(X_t=k\,\big|\,Z_t=i\big)\ $ for all $\ i\ $ and $\ k\ $, you need to calculate:

  • $\ p\big(Z_t=i\,\big|\,X_{1:T}=x_{1:T}\big)=\frac{p\big(Z_t=i,X_{1:T}=x_{1:T}\big)}{p\big(X_{1:T}=x_{1:T}\big)}\ $ for all $\ i\ $;
  • $\ p\big(Z_t=i, Z_{t+1}=j\,\big|\,X_{1:T}=x_{1:T}\big)=\frac{p\big(Z_t=i, Z_{t+1}=j,X_{1:T}=x_{1:T}\big)}{p\big(X_{1:T}=x_{1:T}\big)}\ $ for all $\ i, j\ $ and $\ t\ $;
  • $\ p\big(Z_t=i, X_t=k\,\big|\,X_{1:T}=x_{1:T}\big)=\frac{p\big(Z_t=i, X_t=k,X_{1:T}=x_{1:T}\big)}{p\big(X_{1:T}=x_{1:T}\big)}\ $ for all $\ i, k\ $ and $\ t\ $.

I think the nicest expositions of the Baum-Welch algorithm are those which express the calculations in terms of matrix multiplication. If $\ n\ $ is the number of hidden states, define the following matrices and column vectors:

  • $\ C(k)\ $, for each possible observed output $\ k\ $, is the $\ n\times n\ $ matrix whose entry in row $\ i\ $ and column $\ j\ $ is $\ a_{ij}b_{ik}\ $;
  • $\ \pi\ $ the $\ n\times1\ $ column vector whose $\ i^\text{th}\ $ entry is $\ \pi_i\ $;
  • $\ \epsilon(i)\ $, for each hidden state $\ i\ $, is the $\ n\times1\ $ column vector whose $\ i^\text{th}\ $ entry is $1$ and whose other entries are all $0$.
  • $\ \mathbf{1}\ $ is the $\ n\times1\ $column vector whose entries are all $1$.

In terms of these matrices and column vectors, the quantities you need to calculate are given by: \begin{align} &p\big(X_{1:T}=x_{1:T}\big)=\pi^TC\big(x_1\big)C\big(x_2\big)\dots C\big(x_T\big)\mathbf{1}\ ,\\ &p\big(Z_t=i,X_{1:T}=x_{1:T}\big)=\\ &\hspace{7em}\cases{\pi_i\epsilon(i)^TC\big(x_1\big)C\big(x_2\big)\dots C\big(x_T\big)\mathbf{1}&if $\ t=1$\\ \pi^T\prod_\limits{s=1}^{t-1}C\big(x_s\big)\epsilon(i)\epsilon(i)^T\prod_\limits{s=t}^TC\big(x_s\big)\mathbf{1}&if $\ 1<t\le T$ }\\ &p\big(Z_t=i,Z_{t+1}=j,X_{1:T}=x_{1:T}\big)=\\ &\hspace{3em}\cases{\pi_ia_{ij}b_{ix_1}\epsilon(j)^TC\big(x_2\big)C\big(x_3\big)\dots C\big(x_T\big)\mathbf{1}&if $\ t=1$\\ \pi^T\prod_\limits{s=1}^{t-1}C\big(x_s\big)\epsilon(i)a_{ij}b_{ix_t}\epsilon(j)^T\prod_\limits{s=t+1}^TC\big(x_s\big)\mathbf{1}&if $\ 1<t\le T-1\ $. }\\ &p\big(Z_t=i, X_t=k,X_{1:T}=x_{1:T}\big)=\\ &\hspace{11em}\cases{p\big(Z_t=i,X_{1:T}=x_{1:T}\big)&if $\ k= x_t$\\ 0&otherwise} \end{align} From these quantities, you can now obtain \begin{align} &p\big(Z_t=i\,\big|\,X_{1:T}=x_{1:T}\big)\ ,\\ &p\big(Z_t=i, Z_{t+1}=j\,\big|\,X_{1:T}=x_{1:T}\big)\ \text{, and}\\ &p\big(Z_t=i, X_t=k\,\big|\,X_{1:T}=x_{1:T}\big)\ , \end{align} and calculate the conditional expectations, given the observations, of:

  • The number of times each hidden state $\ i\ $ was visited during the instants $\ t=1,2,\dots, T-1\ $, $$\sum_{t=1}^{T-1}p\big(Z_t=i\,\big|\,X_{1:T}=x_{1:T}\big) $$ and the number of times it was visited during the instants $\ t=1,2,\dots, T\ $, $$\sum_{t=1}^Tp\big(Z_t=i\,\big|\,X_{1:T}=x_{1:T}\big) $$
  • The number of times the hidden Markov chain made a transition from state $\ i\ $ at time $\ t\ $ to state $\ j\ $ at time $\ t+1\ $, $$ \sum_{t=1}^{T-1}p\big(Z_t=i, Z_{t+1}=j\,\big|\,X_{1:T}=x_{1:T}\big)$$
  • The number of times during the instants $\ t=1,2,\dots, T\ $ that $\ X_t=k\ $ and the hidden state was $\ i\ $, $$ \sum_{t=1}^Tp\big(Z_t=i, X_t=k\,\big|\,X_{1:T}=x_{1:T}\big)\ .$$ This completes the E (Expectation) stage of the EM algorithm. You now perform the M (Modification) stage by replacing the quantities $\ \pi_i\ $, $\ a_{ij}\ $ and $\ b_{ik}\ $ with their new estimates, \begin{align} \hat{\pi}_i&=p\big(Z_1=i\,\big|\,X_{1:T}=x_{1:T}\big)\ ,\\ \hat{a}_{ij}&=\frac{\sum_\limits{t=1}^{T-1}p\big(Z_t=i, Z_{t+1}=j\,\big|\,X_{1:T}=x_{1:T}\big)}{\sum_\limits{t=1}^{T-1}p\big(Z_t=i\,\big|\,X_{1:T}=x_{1:T}\big)}\ \text{, and}\\ \hat{b}_{ik}&=\frac{\sum_\limits{t=1}^Tp\big(Z_t=i, X_t=k\,\big|\,X_{1:T}=x_{1:T}\big)}{\sum_\limits{t=1}^Tp\big(Z_t=i\,\big|\,X_{1:T}=x_{1:T}\big)}\ . \end{align}