I am reading a technical report on expectation-maximization (EM) algorithm (http://melodi.ee.washington.edu/people/bilmes/mypapers/em.pdf) and I am confused about something.
For HMMs, it defines $b_j(o_t)=P(O_t=o_t\mid Q_t=j)$ ($O$ observation, $Q$ state). However, for the case of using Gaussian mixture models for observation distribution, it defines $b_j(o_t)=\sum_{l=1}^Mc_{jl}N(o_t\mid \mu_{jl},\Sigma_{jl})$ where $N$ denotes Gaussian/normal PDF I presume. The problem is, it looks to me like the latter definition of $b_j(\cdot)$ is a PDF not a PMF, whereas the original definition implies that it's a PMF.
The latter definition is a PDF. It's the continuous version of the distribution $b_j$ of observations for state $j$. The discussion at the bottom of page 7 (and continues to top of page 8) says:
(note the typo "probably" :)