I am currently studying Hidden Markov Models (HMM). We denote the hidden quantities as $(X_0, \dots, X_n) \in \mathcal{X}^{n+1}$ and the observed quantities $(Y_1, \dots, Y_n) \in \mathcal{Y}^n$. The usual assumption for HMMs is that the joint density of these random variables simplifies in the following way
$$ \begin{aligned} &p(X_0 = x_0, \dots, X_n = x_n, Y_1 = y_1, \dots, Y_n = y_n) \\ &= \underbrace{p(X_0 = x_0)}_{\text{distribution of initial states}} \cdot \prod_{n=1}^n \underbrace{p(Y_i = y_i \,|\, X_i = x_i)}_{\text{emission probabilities}} \cdot \underbrace{p(X_i = x_i \,|\, X_{i-1} = x_{i-1})}_{\text{transition probabilities}} \end{aligned}. $$ This main assumption can be derived from the following assumptions.
- A1. $(X_0, \dots, X_n)$ is a Markov, i.e. $p(X_n = x_n\,|\, X_{n-1} = x_{n-1}, \dots, X_{0} = x_0) = p(X_n = x_n\,|\, X_{n-1} = x_{n-1})$.
- A2. $p(Y_1 = y_1, \dots, Y_n = y_n \,|\, X_n = x_0, \dots, X_0 = x_0) = \prod_{i=1}^n p(Y_i = y_i \,|\, X_i = x_i)$.
In turn, assumption A2. can clearly be replaced by the following two assumptions.
- A2.1. The random variables $Y_1, \dots, Y_n$ are conditionally independent given $X_0, \dots, X_n$, i.e. $$p(Y_1 = y_1, \dots, Y_n = y_n \,|\, X_n = x_n, \dots, X_0 = x_0) = \prod_{i=1}^n p(Y_i = y_i \,|\, X_n = x_n, \dots, X_0 = x_0)$$
- A2.2. $p(Y_i = y_i \,|\, X_n = x_n, \dots, X_0 = x_0) = p(Y_i = y_i \,|\, X_i = x_i)$.
My question is now about the relation between assumption A2. and A2.1./A2.2.. To me it is clear that the former is implied by the latter two. But I can't see if the other direction is true as well. More specifically:
- Generally, does A2. imply A2.1./A2.2.?
- Does A2. imply A2.1./A2.2. when A1. holds?
- (Minor) Is there a specific name for assumption A2. and A2.2.. I provided names for A1. and A2.1., but I don't recognise the former as known properties.
The answer to the first question is "yes", so, a fortiori, so is the answer to the second.
Note that \begin{align} p(Y_j = y_j\,|\, &X_n = x_n, \dots, X_0 = x_0)=\\ &\sum_{y_1,\dots,y_{j-1}\\ y_{j+1},\dots,y_n}p(Y_1 = y_1, \dots, Y_n = y_n \,|\, X_n = x_n, \dots, X_0 = x_0) \end{align} and $$ p(Y_j = y_j \,|\, X_j = x_j)=\sum_{y_1,\dots,y_{j-1}\\ y_{j+1},\dots,y_n}\prod_{i=1}^n p(Y_i = y_i \,|\, X_i = x_i)\ . $$ Therefore, if \begin{align} p(Y_1 = y_1, \dots, Y_n = y_n \,|\, X_n = x_n, \dots, &X_0 = x_0) =\\ &\prod_{i=1}^n p(Y_i = y_i \,|\, X_i = x_i) \end{align} (i.e. A2. holds), then by summing both sides of this equation over $\ y_1,\dots,$$\,y_{j-1},y_{j+1},\dots,y_n\ $, we get $$ p(Y_j = y_j\,|\, X_n = x_n, \dots, X_0 = x_0)=p(Y_j = y_j \,|\, X_j = x_j)\ , $$ which is A2.2.. Now substituting $\ p(Y_i = y_i\,|\, X_n = x_n, \dots, X_0 = x_0)\ $ for $\ p(Y_i = y_i \,|\, X_i = x_i)\ $ back in the product on the left side of A2. gives \begin{align} p(Y_1 = y_1, \dots, Y_n = y_n \,|\, &X_n = x_n, \dots, X_0 = x_0) =\\ &\prod_{i=1}^n p(Y_i = y_i\,|\, X_n = x_n, \dots, X_0 = x_0)\ , \end{align} which is A2.1..
In early papers on Hidden Markov Models, their output $\ Y\ $ was referred to as "a probabilistic function" of the underlying hidden Markov chain $\ X\ $, and I believe it's the property A2. that constitutes the definition of what it means to be "a probabilistic function". Apart from that possibility (of which I'm not entirely certain), I know of no other names for either A2. or A2.2..