HMMs: Difference between the joint and conditional probabilities

45 Views Asked by At

I am having trouble in giving meaning to the joint and conditional probabilities related to the observations and states of HMMs in the Appendix A of Speech and Language Processing by Jurafsky and Martin. More specifically, a forward trellis is defined as follows in page 6 of the appendix:

The forward algorithm trellis α$_{t}(j)$ represents the probability of being in state $j$ after seeing the first $t$ observations.

Formally,

α$_t(j)=P(o_1, ..., o_t, q_t=j|$ λ$)$

On the other hand, on page 13, ξ$_t(i, j)$ is defined as:

... the probability of being in state $i$ at time $t$ and state $j$ at time $t+1$.

Also formally,

ξ$_t(i, j)=P(q_t=i, q_{t+1}=j|$ $O,$ λ$)$

I understand why in the trellis' case, the observations might be taken as part of the joint probability. But why are we conditioning the states on the observations when computing ξ$_t(i, j)$?