Expectations in Hidden Markov Models

54 Views Asked by Bumbble Comm At 05 Apr 2026 - 5:56

Background:

Suppose I have an HMM characterized by the parameters $\theta = (\mathbf{A}, \mathbf{B}, \mathbf{\pi})$, where $A,B,\pi$ are the hidden state transition probability matrix, the emission probability matrix, and the initial distribution of the hidden states, respectively.

We know that the joint distribution of the hidden states and observed data can be expressed as:

$$ p(\mathbf{Z}, \mathbf{X}) = p(\mathbf{z}_{1} | \pi) p(\mathbf{x}_{1} | \mathbf{z}_{1}, \mathbf{B}) \prod_{n=2}^{N} p(\mathbf{z}_{n} | \mathbf{z}_{n-1}, \mathbf{A}) p(\mathbf{x}_{n} | \mathbf{z}_{n}, \mathbf{B}) $$

where $\mathbf{Z} = \{ \mathbf{z}_{1}, \mathbf{z}_{2}, \ldots \mathbf{z}_{N} \}$, $\mathbf{X} = \{\mathbf{x}_{1}, \mathbf{x}_{2}, \ldots, \mathbf{x}_{N} \}$, $\mathbf{z}_{n}$ is the vector of hidden states at time $n$, and $\mathbf{x}_{n}$ is the vector of observed states at time $n$.

We assume that $\mathbf{z}_{n}$ can take $K$ possible states, and $\mathbf{x}_{n}$ can take $P$ possible states, and therefore model them as one-hot-encoded vectors. So the pdfs become:

$$ p(\mathbf{z}_{1} | \pi) = \prod_{j=1}^{K} \pi_{k}^{z_{1,j}} $$ $$ p(\mathbf{z}_{n} | \mathbf{z}_{n-1}, \mathbf{A}) = \prod_{j=1}^{K} \prod_{i=1}^{K} a_{j,i}^{z_{n,i} z_{n-1,j}} $$ $$ p(\mathbf{x}_{n} | \mathbf{z}_{n}, \mathbf{B}) = \prod_{j=1}^{K} \prod_{m=1}^{P} b_{j,m}^{z_{n,j} x_{n,m}} $$

So taking the log and plugging in, we get:

$$ln(p(\mathbf{Z}, \mathbf{X})) = \sum_{j=1}^{K}z_{i,j} ln(\pi_{j}) + \sum_{n=1}^{N} \sum_{j=1}^{K} \sum_{m=1}^{P} z_{n,j} x_{n,m} ln(b_{j,m}) + \sum_{n=2}^{N} \sum_{j=1}^{K} \sum_{i=1}^{K} z_{n,i}z_{n-1,j} ln(a_{j,i}) $$

Now since the $z_{i,j}$ are random variables, to maximize this with respect to our parameters, we can take the expectation with respect to the posterior distribution $p(\mathbf{Z}|\mathbf{X})$.

Question:

Lets suppose we do this, then we have:

$$ln(p(\mathbf{Z}, \mathbf{X})) = \sum_{j=1}^{K} E_{\mathbf{Z}|\mathbf{X}}[z_{i,j}] ln(\pi_{j}) + \sum_{n=1}^{N} \sum_{j=1}^{K} \sum_{m=1}^{P} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] ln(b_{j,m}) + \sum_{n=2}^{N} \sum_{j=1}^{K} \sum_{i=1}^{K} E_{\mathbf{Z}|\mathbf{X}}[z_{n,i}z_{n-1,j}] ln(a_{j,i}) $$

Now in Beal's thesis, he claims:

$$ b_{j,m} = \frac{ \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] }{ \sum_{n=1}^{N} \sum_{m=1}^{P} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] } = \frac{ \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] }{ \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j}] } $$

which is easily found by optimizing with respect to $b_{j,m}$. However I am confused:

What is $E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}]$? I think it is just $E_{\mathbf{Z}|\mathbf{x}_{n} = \mathbf{e}_{m}}[z_{n,j}]$, where $\mathbf{e}_{m}$ is a $P$-vector of all zeros, except with the $m$th entry being one. Am I correct? But how would I find this probability?
How is $\sum_{n=1}^{N} \sum_{m=1}^{P} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j} x_{n,m}] = \sum_{n=1}^{N} E_{\mathbf{Z}|\mathbf{X}}[z_{n,j}]$? I think it has to do with the one-hot encoding, but I am not sure.

Thanks!

Original Q&A

Expectations in Hidden Markov Models

Related Questions in MARKOV-CHAINS

Related Questions in EXPECTED-VALUE

Related Questions in CONDITIONAL-EXPECTATION

Related Questions in MARKOV-PROCESS

Related Questions in HIDDEN-MARKOV-MODELS

Trending Questions

Popular # Hahtags

Popular Questions