Consider an irreducible but possibly periodic Markov chain on a finite state space with transition matrix $P$. We know there exists a unique stationary distribution $\pi$. If the Markov chain were aperiodic, we would have $P^n_{ij} \to \pi(j)$ as $n \to \infty$. This fails if the chain is periodic, but we have convergence of the Cesaro averages: $$\frac{1}{n}\sum_{k=1}^n P^k_{ij} \to \pi(j) \text{ as } n \to \infty.$$ Can anyone point me to a reference that states this fact? Every reference I've seen only considers convergence for aperiodic chains, or "fixes" periodicity by considering a lazy version of the chain. Alternatively, is there a simple way to obtain this result using the result for aperiodic chains?
Reference request: convergence for periodic Markov chains
603 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
The result follows immediately from applying the elementary renewal theorem to delayed renewal processes.
Here's a more elementary algebraic proof, using telescoping.
(problem 16, page 468 of Grinstead and Snell's free book https://math.dartmouth.edu/~prob/prob/prob.pdf )
for stochastic, $\text{m x m}$ matrix $P$
$\mathbf \pi^T P = \mathbf \pi^T$
and $ P\mathbf 1 = \mathbf 1$,
$W:= \mathbf 1 \mathbf \pi^T$ and $\text{trace}\big(W\big) = 1$
consider the following telescope
$\Big(I+P+P^2+....+ P^{n-1}\Big)\Big(I-P+W\Big) = I -P^n +nW$
thus
$\frac{1}{n}\Big(I+P+P^2+....+ P^{n-1}\Big)$
$= \frac{1}{n}\big(I -P^n +nW\big)\Big(I-P+W\Big)^{-1} $
$= \frac{1}{n}\Big\{\Big(I-P+W\Big)^{-1}\Big\} -\frac{1}{n}\Big\{P^n\Big(I-P+W\Big)^{-1}\Big\} +\frac{1}{n}\Big\{nW\Big(I-P+W\Big)^{-1}\Big\} $
$= \frac{1}{n}\Big(I-P+W\Big)^{-1} -\frac{1}{n}P^n\Big(I-P+W\Big)^{-1} +W$
now pass limits
$\lim_{n\to\infty}\Big\{ \frac{1}{n}\Big(I-P+W\Big)^{-1} -\frac{1}{n}P^n\Big(I-P+W\Big)^{-1} +W\Big\}$
$= \Big\{\lim_{n\to\infty}\frac{1}{n}\Big(I-P+W\Big)^{-1}\Big\} -\Big\{\lim_{n\to\infty} \frac{1}{n}P^n\Big(I-P+W\Big)^{-1}\Big\} +W$
$=0+0+W$
so
$W=\lim_{n\to\infty} \frac{1}{n}\Big(I+P+P^2+....+ P^{n-1}\Big)$
That's the argument in its entirety. I've left three book-keeping details for the end.
re: the third term simplification $W\Big(I-P+W\Big)^{-1}=W$
suppose $\Big(I-P+W\Big)^{-1}$ exists, then consider the inverse problem
$W\Big(I-P+W\Big) = W-WP +W^2 = W-W + W = W$
now multiply both sides on the right by $\Big(I-P+W\Big)^{-1}$
re: the second limit
observe that
$\Big\Vert \frac{1}{n}P^n\Big(I-P+W\Big)^{-1} - \mathbf 0\Big\Vert_F$
$ = \frac{1}{n}\Big\Vert P^n\Big(I-P+W\Big)^{-1}\Big\Vert_F$
$ \leq \frac{1}{n}\Big\Vert P^n\Big\Vert_F\cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ \leq \frac{1}{n} \mathbf 1^T P^n \mathbf 1 \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ = \frac{1}{n} \mathbf 1^T \mathbf 1 \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ = \frac{m}{n} \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F$
$ \lt \epsilon $
for large enough n
(The second to last inequality follows from triangle inequality)
re: the invertibility of $\Big(I-P+W\Big)$
we prove $\det\Big(I-P+W\Big)=\prod_{j=2}^n (1-\lambda_j)$ and hence the matrix is invertible.
the nicest proof involves (partial) symmetrization:
using Perron Frobenius theory, we know that $\lambda_1 =1 $ is simple since $P$ is irreducibile.
$\mathbf v_1 := \mathbf \pi^\frac{1}{2}\cdot \frac{1}{\big \Vert \mathbf \pi^\frac{1}{2}\big \Vert_2}$
(where the square root is interpretted to be taken component-wise)
diagonal matrix $D:=\text{diag}\big(\mathbf v_1\big)$
Consider the similar matrix
$D\Big(I-P+W\Big)D^{-1} = I- (DPD^{-1}) +DWD^{-1} = I - B + \mathbf v_1\mathbf v_1^T$
$B$ has $\mathbf v_1$ as its left and right eigenvectors (check!).
Working over $\mathbb C$ and applying Schur Triangularization to $B$:
$V := \bigg[\begin{array}{c|c|c|c}\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}\end{array}\bigg]$
$B = VRV^{-1} = VRV^{*} =V\begin{bmatrix} 1 & \mathbf x_{m-1}^*\\ \mathbf 0 & \mathbf R_{m-1} \end{bmatrix}V^* =V\begin{bmatrix} 1 & \mathbf 0^T\\ \mathbf 0 & \mathbf R_{m-1} \end{bmatrix}V^*$
note $\mathbf x_{m-1} = \mathbf 0$ because $ \mathbf v_1^T = \mathbf v_1^* =\mathbf v_1^* B = 1\cdot \mathbf v_1^* + \sum_{j} x_j\cdot \mathbf v_j^*$
and the columns of $\mathbf V$ (or rows of $\mathbf V^*$) are linearly independent so every $x_j =0$
By simplicity of the Perron root: $\mathbf R_{m-1}$ does not have eigenvalues of 1, so
$I -B + \mathbf v_1 \mathbf v_1^T = V\big(I-R + \mathbf e_1\mathbf e_1^T\big)V^{*} =V\begin{bmatrix} 1 & \mathbf 0^T \\ \mathbf 0 & I_{m-1} -\mathbf R_{m-1} \end{bmatrix}V^*$
hence the determinant is $1\cdot \prod_{j=2}^n (1-\lambda_j) \neq 0$.
I'd like to present a different elementary approach. I'm omitting some details.
From irreducibility, for any pair of states $i,i'$, there exists $n_{i,i'}$ such that $p^{n_{i,i'}}_{i,i'}>0$. Therefore the probability of visiting $i'$ by time $n_{i,i'}$ or earlier, starting from $i$, is at least $p^{n_{i,i'}}_{i,i'}$. Let $\bar p = \min p^{n_{i,i'}}_{i,i'}>0$, and let $\bar n= \max n_{i,i'}<\infty$.
That's the key to everything.
Thus, regradless of the current state and entire past, the probability that the process will visit $i'$ at least once in the next $\bar n$ steps is at least $\bar p$. In particular, the probability that state $i'$ was not visited by time $L{\bar n}$ is bounded above by $(1-{\bar p})^L$. Let $\tau_{i'}$ be the first time the chain hits state $i'$. Then we showed $P_i(\tau_{i'}>L \bar n) \le (1-{\bar p})^L \to 0$. In particular $\tau_{i'}$ has finite expectation under $P_i$. That's true for all $i,i'$.
Write $S_n (i,j)= \sum_{k=0}^n p^k_{ij}$.
Then
\begin{align*} S_n(i,j)&= E_i[\sum_{k=0}^n {\bf 1}_{X_k}(j)] \\& \le E_i[ \sum_{k=0}^n {\bf 1}_{X_k} (j), \tau_{i'}<n] + (n+1) P(\tau_{i'}>n)\\ &\le E_i [\tau_{i'}]+ E_{i'} [\sum_{k=\tau_{i'}}^n {\bf 1}_{X_j}(j),\tau_{i'}\le n] +(n+1) P_i (\tau_{i'}>n)\\ & \le E_i [\tau_{i'}] + S_n (i',j) + (n+1) P_i(\tau_{i'}>n). \end{align*}
Therefore,
$$ \limsup_{n\to\infty} \frac{1}{n+1} (S_n (i,j) -S_n(i',j))\le 0.$$
Since this is true for all choices of $i,i'$, the limit exists and is equal to $0$.
Finally, suppose the states are $1,\dots,K$ and let $\pi$ be the stationary measure. Then
$$\pi(j) = \frac{1}{n+1} \sum_{i=1}^K \pi(i)S_n(i,j) =\frac{1}{n+1} S_n (1,j)+ \sum_{i'>1}\pi(i') \frac{1}{n+1} ( S_n (i',j) - S_n (1,j) ). $$
As the sum on the rightand tends to $0$, the result follows.