How long does it take two identical hidden Markov models run on same observations to forget their initial distributions (if ever)?

Question

How long does it take two identical hidden Markov models run on same observations to forget their initial distributions (if ever)?

78 Views Asked by Bumbble Comm At 26 Mar 2026 - 4:31

Let $H_1$ and $H_2$ be two instances of a finite Hidden Markov Model (HMM) $H$. That is, $H_1$ and $H_2$ have identical state spaces $Q$ as well as identical transition $A$ and emission probabilities $B$. However, they have different initial distributions $\pi_1$ and $\pi_2$, respectively. Let $(O_0, O_1, O_2\ldots)$ be a sequence of observations applied to both $H_1$ and $H_2$.

Has anyone considered the question how long does it take both $H_1$ and $H_2$ to forget their initial distributions? That means their belief states $b_1: Q \rightarrow [0, 1]$ and $b_2: Q \rightarrow [0, 1]$ become arbitrarily close to each other?

By "arbitrarily close" here I mean "Given any $\epsilon > 0$, what is $N(\epsilon) > 0$ such that, for all observations $O_n, n > N(\epsilon):\ dist(b_1, b_2) < \epsilon$ for some distance function $dist$?"

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2015-02-05 00:35:37

I stumbled on two papers that might hold clues:

J. Wolfowitz "Products of Indecomposable, Aperiodic, Stochastic Matrices," AMS, 1962.

J. Hajnal, "The ergodic properties of non-homogeneous finite Markov chains," 1956.

(I don't have access to Hajnal's paper)

The main theorem is that:

Let $A_1,\ldots,A_k$ be square stochastic matrices of the same order such that any word in the $A$'s is SIA (i.e., stochastic, indecomposable, aperiodic). For any $\epsilon > 0$, there exists an integer $\nu(\epsilon)$ such that any word $B$ in the $A$'s of length $n \geq \nu(\epsilon)$ satisfies $\delta(B) < \epsilon$.

We may apply this to HMMs as follows: Let $\mathbf{A}$ be the HMM transition matrix (assumed indecomposable and aperiodic). Let $\mathbf{A}^{(k)}$ be the product of $\mathbf{A}$ with the $k^{th}$ column of $\mathbf{B}$, the observation matrix. That is: $\mathbf{A}^{(k)}_{ij} = \mathbf{A}_{ij} \times \mathbf{B}_{jk}$

Then all $\mathbf{A}^{(k)}$'s are normalized to obtain row stochastic matrices (What conditions must $\mathbf{B}$ satisfy so that the normalized $\mathbf{A}^{(k)}$'s are SIA?). This way, given any observation sequence $O_0, O_1, O_2, \ldots$, we obtain that the final belief state is given by:

$\mathbf{b}_f = \mathbf{b}_0 \times \mathbf{A}^{(O_1)} \times \mathbf{A}^{(O_2)} \times \ldots$

which must converge to the unique row of the rank-one matrix $\prod_{i} \mathbf{A}^{(O_i)}$. Note that $\mathbf{b_0}$ is the initial belief state multiplied by probabilities of the initial observation $O_0$ and normalized.

So it seems inhomogeneous Markov chains hold a clue. Maybe someone with a deeper knowledge on these can elaborate the missing conditions on $\mathbf{B}$ or point out flaws of this reasoning, if there is any.

How long does it take two identical hidden Markov models run on same observations to forget their initial distributions (if ever)?

There are 1 best solutions below

Related Questions in STOCHASTIC-PROCESSES

Related Questions in MARKOV-CHAINS

Related Questions in MARKOV-PROCESS

Related Questions in ERGODIC-THEORY

Related Questions in BAYESIAN-NETWORK

Trending Questions

Popular # Hahtags

Popular Questions