I have started reading about Hidden Markov Models, and have some (more or less) minor questions about things I am not sure I understood correctly. I hope asking here is fine:
1. Assumption about the conditional independence of observations: One of the independence assumptions is that an observation is independent of previous observations. I read that mathematically, this can be formulated:
$$ p(\textbf{O} | q_1,q_2,q_3,..,q_T) = \prod_{t=1}^T p(o_t | q_t) ,$$ with $\textbf{O}$ a chain of observations with observations $o_t$, and the $q_t$ the states at time point t.
My question here: I understand that if we assume conditional independence of the observations from each other, by the definition of conditional independence we arrive at:
$ p(\textbf{O} | q_1,q_2,q_3,..,q_T) = \prod_{t=1}^T p(o_t | q_1,q_2,...,q_T) $ - but don't we need the assumption that the current observation $o_i$ is only dependent on the state $q_i$ to arrive at $\prod_{t=1}^T p(o_t | q_t)$?
2. Homogeneity HMM: Hidden Markov Models are time homogeneous - why exactly is this assumption made?
I see that the assumptions of conditional independence are useful for solving some problems regarding HMMs quickly, but what use does the assumption of homogeneity have?
3. Again about the independence of observations: Even though an observation $o_i$ is not dependent on any previous observation, it is conditionally dependent on the state $q_i$.
$q_i$ in turn is because of the Markov property (only) dependent on $q_{i-1}$ - so how come $o_i$ is not dependent on $q_{i-1}$?
Thanks!
Have you really looked at a hidden markov model in graphical form? I'm guessing a lot of things would become clearer if you did so.
For (1) Yes, you need to have the Markov property to show that $p(\mathbf{O_t}|q_1, q_2,\ldots, q_T) = \prod_t p(o_t|q_t)$. Otherwise, you would have the more general expression.
Let me just demonstrate with a simple example. Imagine we have only 3 states and the corresponding observations. We want to find $p(o_1, o_2, o_3 | q_1, q_2, q_3)$.
Assume we have the following undirected model. (Colors and shading don't mean anything.):
Based on this graph, you should be able to convince yourself of the following conditional independencies. If not, then it's a good idea to study probabilistic graphical models.
Given $q_1$, $o_1$ is independent of the rest of the graph, i.e. $q_2, o_2, q_3, o_3$. From this we can infer: $$p(o_1, o_2, o_3 | q_1, q_2, q_3) = p(o_1 | q_1, q_2, q_3) p(o_2, o_3 | q_1, q_2, q_3) = p(o_1 | q_1) p(o_2, o_3 | q_1, q_2, q_3)$$
Given $q_2$, $o_2$ is independent of the rest of the graph. From this, we can infer: $$p(o_2, o_3 | q_1, q_2, q_3) = p(o_2 | q_2) p(o_3 | q_1, q_2, q_3)$$
Given $q_3$, $o_3$ is independent of the rest of the graph. Thus, $p(o_3 | q_1, q_2, q_3) = p(o_3 | q_3)$.
As a result, we have: $$p(o_1, o_2, o_3 | q_1, q_2, q_3) = p(o_1 | q_1)p(o_2 | q_2)p(o_3 | q_3)$$
For (3), I think you are getting confused about dependence and conditional independence. Observation $o_i$ is dependent on any observation $o_{k}$ AND $p_{k}$ for any $k \not = i$. Nonetheless, GIVEN $p_i$, $o_i$ is independent of $o_k$ and $p_k$ for any $k \not = i$.
I would recommend these lecture videos from Daphne Coller's coursera course to review flow of influence in Bayesian Networks and Markov Networks.