Expectations in Hidden Markov Models

54 Views Asked by At

Background:

Suppose I have an HMM characterized by the parameters $\theta = (\mathbf{A}, \mathbf{B}, \mathbf{\pi})$, where $A,B,\pi$ are the hidden state transition probability matrix, the emission probability matrix, and the initial distribution of the hidden states, respectively.

We know that the joint distribution of the hidden states and observed data can be expressed as:

$$ p(\mathbf{Z}, \mathbf{X}) = p(\mathbf{z}_{1} | \pi) p(\mathbf{x}_{1} | \mathbf{z}_{1}, \mathbf{B}) \prod_{n=2}^{N} p(\mathbf{z}_{n} | \mathbf{z}_{n-1}, \mathbf{A}) p(\mathbf{x}_{n} | \mathbf{z}_{n}, \mathbf{B}) $$

where $\mathbf{Z} = \{ \mathbf{z}_{1}, \mathbf{z}_{2}, \ldots \mathbf{z}_{N} \}$, $\mathbf{X} = \{\mathbf{x}_{1}, \mathbf{x}_{2}, \ldots, \mathbf{x}_{N} \}$, $\mathbf{z}_{n}$ is the vector of hidden states at time $n$, and $\mathbf{x}_{n}$ is the vector of observed states at time $n$.

We assume that $\mathbf{z}_{n}$ can take $K$ possible states, and $\mathbf{x}_{n}$ can take $P$ possible states, and therefore model them as one-hot-encoded vectors. So the pdfs become:

$$ p(\mathbf{z}_{1} | \pi) = \prod_{j=1}^{K} \pi_{k}^{z_{1,j}} $$ $$ p(\mathbf{z}_{n} | \mathbf{z}_{n-1}, \mathbf{A}) = \prod_{j=1}^{K} \prod_{i=1}^{K} a_{j,i}^{z_{n,i} z_{n-1,j}} $$ $$ p(\mathbf{x}_{n} | \mathbf{z}_{n}, \mathbf{B}) = \prod_{j=1}^{K} \prod_{m=1}^{P} b_{j,m}^{z_{n,j} x_{n,m}} $$

So taking the log and plugging in, we get:

$$ln(p(\mathbf{Z}, \mathbf{X})) = \sum_{j=1}^{K}z_{i,j} ln(\pi_{j}) + \sum_{n=1}^{N} \sum_{j=1}^{K} \sum_{m=1}^{P} z_{n,j} x_{n,m} ln(b_{j,m}) + \sum_{n=2}^{N} \sum_{j=1}^{K} \sum_{i=1}^{K} z_{n,i}z_{n-1,j} ln(a_{j,i}) $$

Now since the $z_{i,j}$ are random variables, to maximize this with respect to our parameters, we can take the expectation with respect to the posterior distribution $p(\mathbf{Z}|\mathbf{X})$.


Question:

Lets suppose we do this, then we have:

$$ln(p(\mathbf{Z}, \mathbf{X})) = \sum_{j=1}^{K} E_{\mathbf{Z}|\mathbf{X}}[z_{i,j}] ln(\pi_{j}) + \sum_{n=1}^{N} \sum_{j=1}^{K} \sum_{m=1}^{P} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] ln(b_{j,m}) + \sum_{n=2}^{N} \sum_{j=1}^{K} \sum_{i=1}^{K} E_{\mathbf{Z}|\mathbf{X}}[z_{n,i}z_{n-1,j}] ln(a_{j,i}) $$

Now in Beal's thesis, he claims:

$$ b_{j,m} = \frac{ \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] }{ \sum_{n=1}^{N} \sum_{m=1}^{P} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] } = \frac{ \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] }{ \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j}] } $$

which is easily found by optimizing with respect to $b_{j,m}$. However I am confused:

  1. What is $E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}]$? I think it is just $E_{\mathbf{Z}|\mathbf{x}_{n} = \mathbf{e}_{m}}[z_{n,j}]$, where $\mathbf{e}_{m}$ is a $P$-vector of all zeros, except with the $m$th entry being one. Am I correct? But how would I find this probability?

  2. How is $\sum_{n=1}^{N} \sum_{m=1}^{P} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] = \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j}]$? I think it has to do with the one-hot encoding, but I am not sure.

Thanks!