Log-likelihood function

169 Views Asked by At

I'm not sure if this could be asked here, or in math overflow...

In the following paper

on page 1675, there is the following formula: $$ \log(P(X_1,\dots,X_n))= \log\left(\boldsymbol{\pi}'\cdot \prod^n_{i=1} P F_i(\theta)\cdot \mathbf{1}\right)$$

The $\lbrace X_i\rbrace$ is a hidden Markov chain, where the $\lbrace S_i\rbrace$ is the Markov chain associated with it.

How do I prove it?

$$\displaystyle P(X_2|X_1)=E(P(X_2|X_1,S_1,S_2))=E(E(P(X_2|X_1,S_1,S_2)|S_1))$$ $$=E(F(X_2|X_1,S_1,\theta_1)\cdot p_{i1}+F(X_2|X_1,S_1,\theta_2)\cdot p_{i2})$$ $$\displaystyle =\sum^2_{i=1}\left(F(X_2|X_1,S_1,\theta_1)\cdot p_{i1}+F(X_2|X_1,S_1,\theta_2)\cdot p_{i2} \right)\cdot P(S_1=i)=\boldsymbol{\pi}'\cdot \boldsymbol{PF}_2(\theta)\cdot \mathbf{1}$$

where $(P(S_1=1)P(S_1=2))= \boldsymbol{\pi}'$, $\boldsymbol{P}$ is the 2x2 transition matrix as in the paper. \

Using a analogous reasoning we get: $$P(X_n|X_{n-1},\dots,X_1)=E(E(P(X_n|X_{n-1},\dots,X_1,S_{n-1},S_n)|S_{n-1}))$$ $$=(P(S_{n-1}=1)P(S_{n-1}=2))'\cdot \boldsymbol{PF}_n(\theta)\cdot \mathbf{1}=\boldsymbol{\pi}'\cdot \boldsymbol{P}^{n-1} \boldsymbol{F}_n(\theta)\cdot \mathbf{1}$$

Also, we have: $$P(X_2,\dots,X_n|X_1)=P(X_n|X_{n-1},\dots,X_1)\cdot P(X_{n-1}|X_{n-2},\dots,X_1)\cdot \dots \cdot P(X_2|X_1) $$

$$\displaystyle Log(P(X_2,\dots,X_n|X_1)= Log( \prod^{n}_{i=1}\boldsymbol{\pi}'\cdot \boldsymbol{P}^{i-1} \boldsymbol{F}_i(\theta)\cdot \mathbf{1})$$

Which is,however, different from what is on the paper:

$$\displaystyle Log(P(X_2,\dots,X_n|X_1)= Log(\boldsymbol{\pi}'\cdot \prod^n_{i=1} \boldsymbol{P F}_i\cdot \mathbf{1})$$

So, where have I gone wrong?

1

There are 1 best solutions below

1
On BEST ANSWER

Copying my answer from MO... There is an improper marginalization in the equation that defines your overall strategy: $$P(X_2,\dots,X_n|X_1)=P(X_n|X_{n-1},\dots,X_1)\cdot P(X_{n-1}|X_{n-2},\dots,X_1)\cdot \dots \cdot P(X_2|X_1)?$$ For simplicity, take $n=3$: $$P(X_2,X_3|X_1)=P(X_3|X_2,X_1) \cdot P(X_2|X_1)?$$ Now remember that each $P$ above is really a marginal over the $S_i$: Now remember that each $P$ above is really a marginal over the $S_i$: $$\sum_{S_1,S_2,S_3}P(X_2,X_3,S_1,S_2,S_3|X_1)=\sum_{S_1,S_2,S_3}P(X_3, S_1,S_2,S_3|X_2,X_1) \cdot \sum_{S_1',S_2',S_3'}P(X_2,S_1',S_2',S_3'|X_1)?$$ But that's not right. Applying the definition of $P(X_2,X_3|X_1)$ gives us only a single summation on the right side: $$\sum_{S_1,S_2,S_3}P(X_2,X_3,S_1,S_2,S_3|X_1)=\sum_{S_1,S_2,S_3}P(X_3, S_1,S_2,S_3|X_2,X_1) \cdot P(X_2,S_1,S_2,S_3|X_1).$$ More to the point, we can apply similar reasoning to $P(X_1,X_2,X_3)$, eventually getting: $$P(X_1,X_2,X_3)=\sum_{S_1,S_2,S_3}P(X_3|S_3)P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1).$$

CAVEAT: In the above equation and what follows, I assume that the $X_t$ are conditionally independent given $S_t$. This assumption makes the formulas more compact, but it isn't essential. To revoke the assumption, just replace $P(X_2|S_2)$ with $P(X_2|X_1,S_2)$ and so on.

And that's the matrix product given in the article! To build up the product manually, work from right to left: $$P(S_1) =\left(\begin{array}{c}P(S_1=1)\\P(S_1=2)\end{array}\right) =\boldsymbol{\pi} =\boldsymbol{P}'\boldsymbol{\pi}$$ $$P(X_1|S_1)P(S_1)=\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1}P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1}P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1,S_2}P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{P}'\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1,S_2}P(X_3|S_3)P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol{F}_3\boldsymbol{P}'\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ $$\sum_{S_1,S_2,S_3}P(X_3|S_3)P(S_3|S_2)P(X_2|S_2)P(S_2|S_1)P(X_1|S_1)P(S_1)=\boldsymbol\iota'\boldsymbol{F}_3\boldsymbol{P}'\boldsymbol{F}_2\boldsymbol{P}'\boldsymbol{F}_1\boldsymbol{P}'\boldsymbol{\pi}$$ I think it makes more sense to keep the product in that order, although you can take the transpose of the whole thing to get the article's expression. (Recall that $\boldsymbol{F}_t=\boldsymbol{F}_t'$.) Such minutiae aren't really too important. The big-picture takeaway is that this matrix product is not a product of scalar likelihoods for each $t$. It is sensitive to long-term correlations between the $S_t$, which supposedly can and should be ignored for the authors' purposes.