Gradient-descent and Hidden Markov Models

1.9k Views Asked by Bumbble Comm At 27 Mar 2026 - 6:15

I would like to use gradient-descent to fit the parameters of a simple 2-state HMM. This paper

Levinson, S. E., Rabiner, L. R. and Sondhi, M. M. (1983), An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition. Bell System Technical Journal, 62: 1035–1074. doi: 10.1002/j.1538-7305.1983.tb03114.x

shows derivations of the partial derivatives required for gradient descent, but I am having trouble following the steps. Namely, the paper starts by stating that:

$$P(\mathbf{O}|\mathbf{M})=\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_{t}(i)a_{ij}b_{j}(O_{t+1})\beta_{t+1}(j)$$

for any $t$ such that $1 \leq t \leq T-1$, where $\mathbf{O} = O_1...O_T$ are the observations and $\mathbf{M}$ represents the transition and emission matrices. $\alpha$ is the forward probability and $\beta$ is the backwards probability. $a_{ij}$ is the probability of transition from state $i$ to state $j$ and $b_j(O_{t+1})$ is the probability of observing $O_{t+1}$ given state $j$. Finally, $N$ is the number of states.

To run gradient descent, one needs to calculate the partial derivatives with respect to model parameters. The paper derives:

$$\frac{\partial{P}}{\partial{a_{ij}}}=\sum_{t=1}^{T-1}\alpha_t(i)b_j(O_{t+1})\beta_{t+1}(j)$$

But the exact steps for reaching this formula are not shown. Can anyone elaborate on how this result was obtained?

Thank you,

Original Q&A

Gradient-descent and Hidden Markov Models

Related Questions in PROBABILITY

Related Questions in OPTIMIZATION

Related Questions in MARKOV-CHAINS

Related Questions in BAYESIAN-NETWORK

Trending Questions

Popular # Hahtags

Popular Questions