Likelihood computation for hidden markov models.

48 Views Asked by At

If we have a $2$-state model (i.e. the simplest non-trivial example) in a hidden markov model, and some generated observation-data $\mathcal{O}$ from the algorithm for generating observations. Is it then always the case that the argument $\theta_i$ of a given family $\Theta = \{\theta_i\}_{i \in I}$ of parameters (where $I$ is some index-set), which maximes the loglikelihood, is the argument $\theta_{\ell}$ which was used to generate $\mathcal{O}$, given that $\theta_{\ell}$ is in $\Theta$.

I.e., is it always the case that $\text{argmax}_{ \theta_i\ \in \ \Theta}\Big(\text{Log}(\mathcal{O}\;|\;\theta_i)\Big) = \theta_{\ell}?$

This seems to not always be the case, I've tried this with hmmtrain in MATLAB, using the Baum-Welch algorithm, with Maxiterations = 1 and investigating the likelihood-output for different initial values given to the algorithm, and found examples where this does not hold.

Edit: I believe one partial answer is that this depends on the observation-sequence $\mathcal{O}$, as well as other factors. If we focus on just this aspect, we could get a more unlikely observation-sequence, so that we will not necessarily find that $\text{argmax}_{ \theta_i\ \in \ \Theta}\Big(\text{Log}(\mathcal{O}\;|\;\theta_i)\Big) = \theta_{\ell}$. But if one takes a large enough sample, I believe one should observe that, on average, $\text{argmax}_{ \theta_i\ \in \ \Theta}\Big(\text{Log}(\mathcal{O}\;|\;\theta_i)\Big) = \theta_{\ell}$.

1

There are 1 best solutions below

0
On

No, it is not true.

Consider an analogous case. Suppose you have a coin, with unknown heads probability $p$. You flip the coin 10 times, and count how many times it came up heads, say $h$. You compute the maximum likelihood estimate $\hat{p}$ for the heads probability -- i.e., your best guess at $p$, given the results of those 10 coin flips. Is it always the case that $\hat{p}=p$? No, certainly not. We have $\hat{p}=h/10$. There is no guarantee that $p$ has the form $m/10$ for some integer $m$, and even if it does, randomness and variability means that often the number of heads will be a bit larger than $10p$ or a bit smaller than $10p$, so $\hat{p}$ will often be a bit larger than or a bit small than $p$.

If we're lucky, it's sometimes true that in the limit, as the number of observations goes to infinity, the maximum-likelihood estimate converges to the true parameter. This is not always the case, but sometimes it is. But that's the most we can hope. Given a finite number of observations, we normally cannot expect the maximum likelihood estimate to be exactly equal to the true parameter. If you're lucky, it might be close (and hopefully, the more observations you have, the closer it will be), but you shouldn't expect it to be exactly equal.