I'm running into a practical problem and figured the best way to solve it would be to ask around as I feel I've done a conceptual mistake somewhere which eludes me.
I apologize as such: to make it perfectly clear to the reader I'll try to include as much detail as possible which might lead to perhaps too-dry a read (and maybe too basic).
Let $\vec{X_1} \in \mathbb{R}^D, \vec{X_2} \in \mathbb{R}^D$ two consecutive samples of a discrete Hidden Markov model.
The HMM is defined through a Markov chain $(H_t)_{t \in \{1,2\}}$ with transition matrix $T \in \mathbb{R}^{K x K}$, stationary distribution $\vec{\pi} \in \mathbb{R}^K$. The emission function of the HMM is defined as $\vec{X_t} = f(H_t)$. For my purpose, it is a random multivariate normal: $ \vec{X_t} | H_t = i \sim N(\vec{\mu_i}, {\Sigma_i})$ .
(For simplicity, I assume here the MC to be irreducible and aperiodic, such that $\vec{\pi}$ exists and is unique).
Constructing $N$ independent pairs $(X_1^{(i)}, X_2^{(i)}), i \in \{1 \cdots N \}$, I'm interested into the computation of $E[\vec{X_1} \vec{X_2}^T]$, and in the estimator $\hat{S} = \frac{1}{N} \sum_{i=1}^N (\vec{X_1}^{(i)} \vec{X_2^T}^{(i)})$.
The property I would like to prove (or disprove) is the consistency of the above estimator: $\hat{S} \xrightarrow[]{D} E[\vec{X_1} \vec{X_2}^T]$.
I find the following theoretical values:
\begin{align*} E[\vec{X_1} \vec{X_2}^T] &= \int \int X_1 X_2^T \mathbb{P}(X_1, X_2) \partial X_1 \partial X_2 \\ & = \int \int \sum_{h_1, h_2} X_1 X_2^T \mathbb{P}(X_1, X_2, h_1, h_2) \partial X_1 \partial X_2 \\ & = \int \int \sum_{h_1, h_2} X_1 X_2^T \mathbb{P}(X_1, X_2| h_1, h_2) \mathbb{P}(h_1, h_2) \partial X_1 \partial X_2 \text{(Bayes)} \\ & = \int \int \sum_{h_1, h_2} X_1 X_2^T \mathbb{P}(X_1 | h_1, h_2, X_2) \mathbb{P}(X_2 | h_1, h_2) \mathbb{P}(h_1, h_2) \partial X_1 \partial X_2 \text{(Bayes again)} \\ & = \sum_{h_1, h_2} \int \int X_1 X_2^T \mathbb{P}(X_1 | h_1) \mathbb{P}(X_2 | h_2) \mathbb{P}(h_1, h_2) \partial X_1 \partial X_2 \text{(HMM emission depends only on current hidden state)} \\ & = \sum_{h_1, h_2} \int \int X_1 \mathbb{P}(X_1 | h_1) \partial X_1 X_2^T \mathbb{P}(X_2 | h_2) \partial X_2 \mathbb{P}(h_1, h_2)\text{(Fubini)} \\ & = \sum_{h_1, h_2} E[X_1 | h_1] E[X_2^T | h_2] \mathbb{P}(h_1, h_2)\text{(Conditional expectation definition)} \\ & = \sum_{h_1, h_2} \mu_{h_1} \mu_{h_2}^T \mathbb{P}(h_1, h_2) \text{(Conditional expectation definition)} \\ & = \sum_{h_1, h_2} \mu_{h_1} \mu_{h_2}^T \mathbb{P}(h_2 | h_1) \mathbb{P}(h_1) \text{(Bayes again)} \\ & = \sum_{h_1, h_2} \mu_{h_1} \mu_{h_2}^T T_{h_1 , h_2} \pi_{h_1} \text{(Definition of HMM quantities)} \end{align*}
I've run practical experiment to assess convergence. In particular, I quantify the difference between the ground-truth value and the estimator (both of which are matrices, owing to the outer product of two vectors $X_1 X_2^T \in \mathbb{R}^{D \cross D}$) through the Frobenius norm: $|E[X_1 X_2^T] - \hat{S}|_F$, and observe a nice exponential decrease, of the form the following:
This decrease seems of the form of $f(x)=exp(-IH(x))$, in other words it does appear to follow a large deviation principle. To confirm this observation, I fit While these results are encouraging, I'd be much happier with a definitive theoretical argument towards consistency (or lackof!).
Thus, to my question: in this particular setting (Multivariate gaussian emissions), how can I prove/disprove consistency of the above estimator?
Thank you for your patience!