I'm not sure if this could be asked here, or in math overflow...
In the following paper
- Cho, Jin Seo, and Halbert White. "Testing for regime switching." Econometrica 75.6 (2007): 1671-1720. doi: 10.1111/j.1468-0262.2007.00809.x
on page 1675, there is the following formula: $$ \log(P(X_1,\dots,X_n))= \log\left(\boldsymbol{\pi}'\cdot \prod^n_{i=1} P F_i(\theta)\cdot \mathbf{1}\right)$$
The $\lbrace X_i\rbrace$ is a hidden Markov chain, where the $\lbrace S_i\rbrace$ is the Markov chain associated with it.
How do I prove it?
$$\displaystyle P(X_2|X_1)=E(P(X_2|X_1,S_1,S_2))=E(E(P(X_2|X_1,S_1,S_2)|S_1))$$ $$=E(F(X_2|X_1,S_1,\theta_1)\cdot p_{i1}+F(X_2|X_1,S_1,\theta_2)\cdot p_{i2})$$ $$\displaystyle =\sum^2_{i=1}\left(F(X_2|X_1,S_1,\theta_1)\cdot p_{i1}+F(X_2|X_1,S_1,\theta_2)\cdot p_{i2} \right)\cdot P(S_1=i)=\boldsymbol{\pi}'\cdot \boldsymbol{PF}_2(\theta)\cdot \mathbf{1}$$
where $(P(S_1=1)P(S_1=2))= \boldsymbol{\pi}'$, $\boldsymbol{P}$ is the 2x2 transition matrix as in the paper. \
Using a analogous reasoning we get: $$P(X_n|X_{n-1},\dots,X_1)=E(E(P(X_n|X_{n-1},\dots,X_1,S_{n-1},S_n)|S_{n-1}))$$ $$=(P(S_{n-1}=1)P(S_{n-1}=2))'\cdot \boldsymbol{PF}_n(\theta)\cdot \mathbf{1}=\boldsymbol{\pi}'\cdot \boldsymbol{P}^{n-1} \boldsymbol{F}_n(\theta)\cdot \mathbf{1}$$
Also, we have: $$P(X_2,\dots,X_n|X_1)=P(X_n|X_{n-1},\dots,X_1)\cdot P(X_{n-1}|X_{n-2},\dots,X_1)\cdot \dots \cdot P(X_2|X_1) $$
$$\displaystyle Log(P(X_2,\dots,X_n|X_1)= Log( \prod^{n}_{i=1}\boldsymbol{\pi}'\cdot \boldsymbol{P}^{i-1} \boldsymbol{F}_i(\theta)\cdot \mathbf{1})$$
Which is,however, different from what is on the paper:
$$\displaystyle Log(P(X_2,\dots,X_n|X_1)= Log(\boldsymbol{\pi}'\cdot \prod^n_{i=1} \boldsymbol{P F}_i\cdot \mathbf{1})$$
So, where have I gone wrong?
Copying my answer from MO... There is an improper marginalization in the equation that defines your overall strategy: $$P(X_2,\dots,X_n|X_1)=P(X_n|X_{n-1},\dots,X_1)\cdot P(X_{n-1}|X_{n-2},\dots,X_1)\cdot \dots \cdot P(X_2|X_1)?$$ For simplicity, take $n=3$: $$P(X_2,X_3|X_1)=P(X_3|X_2,X_1) \cdot P(X_2|X_1)?$$ Now remember that each $P$ above is really a marginal over the $S_i$: Now remember that each $P$ above is really a marginal over the $S_i$: $$\sum_{S_1,S_2,S_3}P(X_2,X_3,S_1,S_2,S_3|X_1)=\sum_{S_1,S_2,S_3}P(X_3, S_1,S_2,S_3|X_2,X_1) \cdot \sum_{S_1',S_2',S_3'}P(X_2,S_1',S_2',S_3'|X_1)?$$ But that's not right. Applying the definition of $P(X_2,X_3|X_1)$ gives us only a single summation on the right side: $$\sum_{S_1,S_2,S_3}P(X_2,X_3,S_1,S_2,S_3|X_1)=\sum_{S_1,S_2,S_3}P(X_3, S_1,S_2,S_3|X_2,X_1) \cdot P(X_2,S_1,S_2,S_3|X_1).$$ More to the point, we can apply similar reasoning to $P(X_1,X_2,X_3)$, eventually getting: $$P(X_1,X_2,X_3)=\sum_{S_1,S_2,S_3}P(X_3|S_3)P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1).$$
CAVEAT: In the above equation and what follows, I assume that the $X_t$ are conditionally independent given $S_t$. This assumption makes the formulas more compact, but it isn't essential. To revoke the assumption, just replace $P(X_2|S_2)$ with $P(X_2|X_1,S_2)$ and so on.
And that's the matrix product given in the article! To build up the product manually, work from right to left: $$P(S_1) =\left(\begin{array}{c}P(S_1=1)\\P(S_1=2)\end{array}\right) =\boldsymbol{\pi} =\boldsymbol{P}'\boldsymbol{\pi}$$ $$P(X_1|S_1)P(S_1)=\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1}P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1}P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1,S_2}P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{P}'\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1,S_2}P(X_3|S_3)P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{F}_3\boldsymbol{P}'\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1,S_2,S_3}P(X_3|S_3)P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol\iota'\boldsymbol{F}_3\boldsymbol{P}'\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ I think it makes more sense to keep the product in that order, although you can take the transpose of the whole thing to get the article's expression. (Recall that $\boldsymbol{F}_t=\boldsymbol{F}_t'$.) Such minutiae aren't really too important. The big-picture takeaway is that this matrix product is not a product of scalar likelihoods for each $t$. It is sensitive to long-term correlations between the $S_t$, which supposedly can and should be ignored for the authors' purposes.