Does anyone know how the joint distribution of the HMM in page 281 of these notes is derived?
Specifically the formula is: $$p(\textbf{y},\textbf{x})=\prod_{t=1}^{T}p(y_t|y_{t-1})p(x_t|y_t)$$
Where $t$ is the $t^{th}$ sequence, $\textbf{x}$ is the observation sequence and $\textbf{y}$ is the state sequence.
I know that: $$f_{X,Y}(x,y)=f_{Y|X}(y|x)f_{X}(x)$$
So I'm not sure why both $p$'s are conditional in the HMM formula. Can someone explain this or link me to an explanation?
With the understanding that $p(y_1\mid y_0):=p(y_1)$, what they appear to be saying is:
$\begin{align}p(\mathbf y,\mathbf x) ~=~& p(\mathbf y)~p(\mathbf x\mid \mathbf y) \\[1ex] ~=~& p(y_1,..,y_T)~p(\mathbf x\mid \mathbf y) \\[1ex] ~=~& \big(~p(y_1)~p(y_2\mid y_1)\cdots p(y_T\mid y_{T-1})~\big)~ p(\mathbf x\mid \mathbf y) \tag 1 \\[1ex] ~=~& \Big(\prod_{t=1}^T p(y_t\mid y_{t-1})\Big)~ p(x_1,..,x_T\mid \mathbf y) \\[1ex] ~=~& \Big(\prod_{t=1}^T p(y_t\mid y_{t-1})\Big)~ \big(p(x_1\mid \mathbf y)\cdots p(x_T\mid \mathbf y)\big) \tag 2 \\[1ex] ~=~& \Big(\prod_{t=1}^T p(y_t\mid y_{t-1})\Big)~ \big(p(x_1\mid y_1)\cdots p(x_T\mid y_T)\big) \tag 3 \\[1ex] ~=~& \Big(\prod_{t=1}^T p(y_t\mid y_{t-1})\Big)~\Big(\prod_{t=1}^T p(x_t\mid y_t) \Big) \\[1ex] ~=~& \Big(\prod_{t=1}^T p(y_t\mid y_{t-1})~p(x_t\mid y_t) \Big) \end{align}$
$(1)$ is true only when $p(y_t\mid y_{t-1},..,y_1)=p(y_t\mid y_{t-1})$ , every member of $\mathbf y$ depends only on the value of the immediately preceding member rather than the preceding list.
$(2)$ is true only when the members of $\mathbf x$ are mutually independent on condition of any $\mathbf y$
$(3)$ is true only when $p(x_t\mid y_1, .. , y_T) = p(x_t\mid y_t)$ , that is every member of $\mathbf x$ depends only on the corresponding member of $\mathbf y$ rather than the entire list. (Which would also make (2) okay).
$\Box$
These do appear to match the two assumption of independence made on page 281, right in the paragraph preceding equation (210).
$\blacksquare$