Advantage of using Hidden Markov Model over Markov Chain

284 Views Asked by At

There are many problems that can be modeled using both Markov chain and Hidden Markov model (HMM). Can anyone please explain mathematically, why HMM should be preferred over Markov chain? Also, please mention any available reference (book, publications). Thank you.

1

There are 1 best solutions below

2
On BEST ANSWER

Here is just one example where the process could be modeled using both methods, but the HMM should be preferred here because it will explain the data better and will give better predictive results.

Suppose that we have a hidden Markov process $X_1, X_2, \dots$ with each $X_{t} \in \{A, B\}$. Let $$\mathbb{P}(X_{t + 1} = A \mid X_{t} = A) = 0.9 = \mathbb{P}(X_{t + 1} = B \mid X_{t} = B).$$ So the hidden process is very "sticky" and is likely to stay in the same state with each transition.

Suppose that $Y_1, Y_2, \dots$ is the observed random process, with each $Y_t \in \{0, 1\}$ and $$\mathbb{P}(Y_t = 0 \mid X_t = A) = 0.8 = \mathbb{P}(Y_t = 1 \mid X_t = B).$$ If you model $Y_1, Y_2, \dots$ as a simple Markov chain with a transition probability matrix that doesn't change over time, then you would estimate transition probabilities $$\mathbb{P}(Y_{t + 1} = 0 \mid Y_t = 0) \approx 0.645 \approx \mathbb{P}(Y_{t + 1} = 1 \mid Y_t = 1).$$ Suppose you have just seen the following sequence: $$001001000.$$ If you use the true distribution, the HMM, to model the data then you can infer that the current hidden state $X_t$ is very likely to be $A$. Then your estimate of $\mathbb{P}(Y_{t + 1} = 0 \mid Y_t = 0, Y_{t - 1} = 0, \dots)$ will be the true value, $0.74$, because $$\mathbb{P}(Y_{t + 1} = 0 \mid X_t = A) = \mathbb{P}(Y_{t + 1} = 0 \mid X_t = A, X_{t + 1} = A)\mathbb{P}(X_{t + 1} = A \mid X_t = A) + \mathbb{P}(Y_{t + 1} = 0 \mid X_t = A, X_{t + 1} = B)\mathbb{P}(X_{t + 1} = B \mid X_t = A) = 0.8\cdot0.9 + 0.2\cdot0.1 = 0.74.$$ However, if you use the simple Markov chain model you will not be making use of the information from the recent sequence of mostly $0$s, and you will underestimate the probability that the next observation will be $0$. You will estimate $\mathbb{P}(Y_{t + 1} = 0 \mid Y_t = 0, Y_{t - 1} = 0, \dots) = \mathbb{P}(Y_{t + 1} = 0 \mid Y_{t} = 0)$ to be $0.645$, less than the true value.