Proving HMM forward algorithm via induction instead of "trellis diagram reasoning".

404 Views Asked by At

Here are two example papers that use the trellis reasoning approach, which doesn't really click with my brain:

In fact every HMM exposition I've come across seems to duplicate the same work (regarding the forward algorithm. Here I want to prove the recursive formula via induction because it's a perfect application for it!

But I will need some help.

Let $T$ be the length of our observation sequence $O = o_1 \cdots o_T$, and we're trying to prove $P(O|\lambda) = \sum\limits_{i=1}^N \alpha_T(i)$. That is $P(O|\lambda)$ is the probability of seeing the observation sequence $O$ given our model $\lambda = (A, B)$ where $A$ is the $N \times N$ state transition probability matrix and $B$ is the $N \times M$ matrix of observation probabilities at each state.

Note that $P(q_i | q_j) \equiv a_{q_jq_i}$ for any arbitrary state sequence $Q = q_1q_2\cdots$, that is $q_i \in \{1, \dots N\}$ for all $i$. Similarly $P(o_i | q_j) \equiv b_{q_jo_i} \equiv b_{q_j}(o_i)$. By definition that is.


Now if $T = 1$ then let that be our inductive basis case. In that case $P(O| \lambda) \equiv \sum\limits_{|Q| = 1} P(O|Q) P(Q) = \sum\limits_{i=1}^N \pi_{i} b_{i}(O_1) = \sum\limits_{i=1}^N \alpha_1(i)$ since $\alpha_1(i) \equiv P(o_1 ; q_1 = i | \lambda)$ by definition. The basis case is proved.

Now suppose that $P(O'| \lambda) = \sum\limits_{i=1}^N \alpha_{T-1}(i)$ for an observation sequence of length $T-1$. Then it is our job now to prove that, when we append another observation $O = O'\cdot o_T$ that $P(O | \lambda) = \sum\limits_{i=1}^N \alpha_{T}(i)$.

Suppose that the model $\lambda$ has seen the hidden state sequence $Q'$ and at the same time has seen the observation sequence $O'$ with $|O'| = |Q'| = T-1$. Then if we multiply that joint probability $P(O', Q')$ by the probability of arriving at $q_T$ and observing $o_T$ we just need to sum over all such products to get what we're after. Namely,

$P(O | \lambda) = \sum\limits_{i=1}^N P(O', Q') P(q_T | q_{T-1}) P(o_T | q_T)$.

Now here is where I am stuck because we need $q_{T-1}$ to be the end of an arbitrary hidden state sequence but that is not part of the summation limits.

I know this method will work out, intuitively. Just need some help over this symbolic hurdle.

1

There are 1 best solutions below

0
On BEST ANSWER

$$P(o_1 o_2 \cdots o_{T}; q_{T} = j| \lambda) = \\ [\sum\limits_{j' = 1}^N P(o_1 o_2 \cdots o_{T-1}; q_{T-1} = j'| \lambda) \cdot P(q_{T}=j| q_{T-1} = j')] \cdot P(o_{T} | q_{T} = j) = \\ [\sum\limits_{j'=1}^N \alpha_{T-1}(j') a_{j'j}] b_{j}(o_T) = \alpha_T (j)$$

Now all you do is sum over $j \in {1, \dots, N}$. Thus a direct proof is sufficient. Not sure how you would apply induction here.