How long does it take two identical hidden Markov models run on same observations to forget their initial distributions (if ever)?

78 Views Asked by At

Let $H_1$ and $H_2$ be two instances of a finite Hidden Markov Model (HMM) $H$. That is, $H_1$ and $H_2$ have identical state spaces $Q$ as well as identical transition $A$ and emission probabilities $B$. However, they have different initial distributions $\pi_1$ and $\pi_2$, respectively. Let $(O_0, O_1, O_2\ldots)$ be a sequence of observations applied to both $H_1$ and $H_2$.

Has anyone considered the question how long does it take both $H_1$ and $H_2$ to forget their initial distributions? That means their belief states $b_1: Q \rightarrow [0, 1]$ and $b_2: Q \rightarrow [0, 1]$ become arbitrarily close to each other?

By "arbitrarily close" here I mean "Given any $\epsilon > 0$, what is $N(\epsilon) > 0$ such that, for all observations $O_n, n > N(\epsilon):\ dist(b_1, b_2) < \epsilon$ for some distance function $dist$?"

1

There are 1 best solutions below

0
On

I stumbled on two papers that might hold clues:

J. Wolfowitz "Products of Indecomposable, Aperiodic, Stochastic Matrices," AMS, 1962.

J. Hajnal, "The ergodic properties of non-homogeneous finite Markov chains," 1956.

(I don't have access to Hajnal's paper)

The main theorem is that:

Let $A_1,\ldots,A_k$ be square stochastic matrices of the same order such that any word in the $A$'s is SIA (i.e., stochastic, indecomposable, aperiodic). For any $\epsilon > 0$, there exists an integer $\nu(\epsilon)$ such that any word $B$ in the $A$'s of length $n \geq \nu(\epsilon)$ satisfies $\delta(B) < \epsilon$.

We may apply this to HMMs as follows: Let $\mathbf{A}$ be the HMM transition matrix (assumed indecomposable and aperiodic). Let $\mathbf{A}^{(k)}$ be the product of $\mathbf{A}$ with the $k^{th}$ column of $\mathbf{B}$, the observation matrix. That is: $\mathbf{A}^{(k)}_{ij} = \mathbf{A}_{ij} \times \mathbf{B}_{jk}$

Then all $\mathbf{A}^{(k)}$'s are normalized to obtain row stochastic matrices (What conditions must $\mathbf{B}$ satisfy so that the normalized $\mathbf{A}^{(k)}$'s are SIA?). This way, given any observation sequence $O_0, O_1, O_2, \ldots$, we obtain that the final belief state is given by:

$\mathbf{b}_f = \mathbf{b}_0 \times \mathbf{A}^{(O_1)} \times \mathbf{A}^{(O_2)} \times \ldots$

which must converge to the unique row of the rank-one matrix $\prod_{i} \mathbf{A}^{(O_i)}$. Note that $\mathbf{b_0}$ is the initial belief state multiplied by probabilities of the initial observation $O_0$ and normalized.

So it seems inhomogeneous Markov chains hold a clue. Maybe someone with a deeper knowledge on these can elaborate the missing conditions on $\mathbf{B}$ or point out flaws of this reasoning, if there is any.