I am trying to derive a Forward-Backward algorithm used in Hidden Markov Models to compute the likelihood $P(x | \theta)$ that sample $x = (x_1, ... x_n)$ comes from HMM defined by set of parameters $\theta$ which includes:
- $k \times k$ transition matrix $A$ such that $A_{i,j}$ gives probability of transition from hidden state $Z = i$ to hidden state $Z = j$.
- initial probabilities $\pi_1 = P(Z_1 = 1), ..., \pi_k = P(Z_1 = k)$
- parameters of distributions of $X$ for each hidden state, e.g. if we assume Gaussian distribution then we have $(\mu_1, ... \mu_k), (\sigma_1, ..., \sigma_k)$
I wrote down that
$$P(x|\theta) = P(x_1 | \theta) \cdot P(x_2 | x_1; \theta) \cdot ... \cdot P(x_n | x_1, ..., x_{n-1}; \theta)$$ which is also how all video explanations and articles I've ever seen start. Then I realized when distribution is discrete, it doesn't make sense. All sources say at the beginning something like "we assume that matrix $A$, vector $\pi$ and probabilities $P(X = x | Z = k)$ are known", somehow going from knowing parameters of in many cases continuous distribution to discrete probabilities. How is it done?