Convergence of Markov Chains

902 Views Asked by At

I have stumbled upon Markov chains respectively finite stochastic matrices (gladly it didn't hurt too much). There is this theorem on the existence of stationary distributions which is standard:

Let P be a stochastic matrix of dimension $N\times N$. Assume that there is some power $P^n$ of P such that every entry of $P^n$ is (strictly) greater than zero. Then the stochastic process $(P^n)_{n\geq0}$ converges to a stochastic matrix $Q$ whose columns are identical.

Here, I follow the convention that a stochastic matrix has stochastic column vectors, i.e., the entries of any given column sum up to 1. This seems to be standard in german text books.

Now in a german schoolbook (Lambacher Schweizer, Mathematik Jahrgangsstufe), I found the following weaker version of the above theorem which looks like this:

Let P be a stochastic matrix of dimension $N\times N$. Assume that there is some power $P^n$ of P such there is some row of $P^n$ whose entries are all (strictly) greater than zero. Then the stochastic process $(P^n)_{n\geq0}$ converges to a stochastic matrix $Q$ whose columns are identical.

Can somebody give a (possibly elementary) proof of this of give a counterexample? I could neither prove nor disprove it to myself (actually I believe it's true) nor could I modify find it in the literature.

Thanks, Martin

1

There are 1 best solutions below

5
On BEST ANSWER

I'm not sure why you'd call the second theorem 'weaker' as it nominally seems stronger / more general. Clearly the first theorem can be seen as a special case of the second one. After dealing with/modding out transient states, we can see the first theorem also implies the second one. Unfortunately there are some book-keeping details associated with transient states that distract from the argument.

For the second theorem, we can assume WLOG that row $1$ is the positive row. What the condition tells you is that all states have paths to state 1. Now WLOG the first $m$ states are reachable from state 1. That is $\{1,2,...,m\}$ forms a recurrent class. Any state $j\gt m$ is transient since they visit state 1 with positive probability and have no probability of returning from there (you can upper bound the expected number of returns with a geometric series hence one doesn't return infinitely often and the state j is transient). The above implies a blocked structure

$P= \left[ \begin{matrix} P_{m\times m} & * \\ \mathbf 0 & T\\ \end{matrix}\right]$
and you are using column stochastic matrices so $\mathbf 1^T P =\mathbf 1^T$

by blocked multiplication
$P^k= \left[ \begin{matrix} P_{m\times m}^k & * \\ \mathbf 0 & T^k\\ \end{matrix}\right]$

and of course $\mathbf 1^T \mathbf P^k = \mathbf 1^T$.

Now as a basic property of transient states we know $T^k\to \mathbf 0$. (Alternatively for a spectral reason, apply Gerschgorin Discs to $T^n$ to see its max modulus eigenvalue is $\lt 1$ hence $T$ has max modulus eigenvalue $\lt 1$ and $T$ tends to zero.)

now selecting $k:=n$ we see
$(P_{m\times m})^n$
has all positive numbers on its first row. What this means is that in $n$ steps there is a path from any state $j\in \{1,2,...,m\}$ to state 1. Being a single communicating class, state 1 has a path to some other state $i$ in one step, i.e. $\mathbf e_i^T P_{m\times m}\mathbf e_1\gt 0$ for some $i \in\{2,...,m\}$

but this means there is a path $1\to i \to 1$ of length $n+1$, as well as a path of length $n$, hence the GCD of paths in this recurrent class is $1$.

2 standard exercises:
i.) the first $m$ states form a single communicating class, so prove that each may reach another in at most $m$ steps.
ii.) By calculation, prove that the GCD of 1 implies $(P_{m\times m})^k$ has all positive components for $k\geq K$.
(This can be avoided by exploiting special structure in the problem -- see the "addendum")

conclusion
now apply theorem (1) and you see $\lim_{k\to \infty} (P_{m\times m})^k = \mathbf \pi\mathbf 1_m^T$

more about transient states
You'll notice I ignored the $*$'d upper right quadrant in this proof as it is not really important and involves some book-keeping. Here are some of these less important details.

Since the characteristic polynomial of $P$ splits into
$\det\big(xI-P) = \det\big(xI-P_{m\times m}\big)\cdot\det\big(xI- T\big)$
what the above tells you is that $P$ has a single eigenvalue on the unit circle, $\lambda =1$ with algebraic multiplicity of one -- this is implicit e.g. in considering that the trace of $\lim_{k\to \infty} P_{m\times m}^k $ which is one which implicitly tells you that $P_{m\times m}\mathbf \pi = \mathbf \pi$ because if for $\mathbf x \not \propto \mathbf \pi$ satisfied $P_{m\times m}\mathbf x = \mathbf x$ then $\mathbf x =\lim_{k\to \infty} P_{m\times m}^k\mathbf x = \mathbf \pi(\mathbf 1_m^T\mathbf x)$ which is a contradiction.

Revisiting the characteristic polynomial of $P$, we know that up to re-scaling there is a single vector that satisfies $\mathbf y^T P = \mathbf y^T$, and we already know this is $\mathbf 1$. And up to re-scaling we know there is a single vector that satisfies $P\mathbf z = \mathbf z$, and
$P= \left[ \begin{matrix} P_{m\times m} & * \\ \mathbf 0 & T\\ \end{matrix}\right]\left[ \begin{matrix} \mathbf z' \\ \mathbf z'' \\ \end{matrix}\right]= \left[ \begin{matrix} * \\ T\mathbf z'' \\ \end{matrix}\right]=\mathbf z = \left[ \begin{matrix} \mathbf z' \\ \mathbf z'' \\ \end{matrix}\right]$

but $T\mathbf z'' = \mathbf z''\implies \mathbf z''=\mathbf 0$ since $T$ cannot have an eigenvalue of $1$. Which implies $\mathbf z' =\mathbf \pi$ or
$\mathbf z =\left[ \begin{matrix} \mathbf \pi \\ \mathbf 0 \\ \end{matrix}\right]$

Finally $B:=\big(P-\mathbf z\mathbf 1^T\big)$ and confirm $B^k=\big(P^k-\mathbf z\mathbf 1^T\big)$, and all of $B$'s eigenvalues have modulus $\lt 1$ thus $\mathbf 0 = \lim_{k\to \infty} B^k=\lim_{k\to \infty} \big(P-\mathbf z\mathbf 1^T\big)^k=\lim_{k\to \infty} P^k-\mathbf z\mathbf 1^T$
or $\lim_{k\to \infty} P^k=\mathbf z\mathbf 1^T$

Addendum
An easier way to show that $(P_{m\times m})^k$ has all positive components for all $k\geq K$ is as follows.
First note that

$(P_{m\times m})^{2n}=(P_{m\times m})^{n}(P_{m\times m})^{n}$
has all positive components in row 1 (why?) and by induction
$(P_{m\times m})^{n\cdot k'}$
has all positive components in row 1 and for any natural number $k'$

Now we adapt this to the other states and since this is a communicating class, for each state $j\in\{2,3,...,m\}$ there exists some $k_j \in \{1,...,m\}$ such that there is a path from state 1 to state $j$. in other words there exists

$\mathbf e_j^T (P_{m\times m})^{k_j}\mathbf e_1 \gt 0$
Thus $(P_{m\times m})^{k_j\cdot n} = P_{m\times m}^{k_j}P_{m\times m}^{k_j\cdot(n-1)}$
has all positive components in row $j$. (It's probably easier to work with the transpose to confirm this.)

set $k' := \prod_{j=2}^r k_j$ and it follows that
$(P_{m\times m})^{k'\cdot n}$
has strictly positive components (is a positive matrix). It immediately follows that
$(P_{m\times m})^{k'\cdot n+1}=(P_{m\times m})^{k'\cdot n}(P_{m\times m})$ is a positive matrix and by induction it follows that
$(P_{m\times m})^{k}$
is a positive matrix for all $k\geq K:= n\cdot\big(\prod_{j=2}^r k_j\big) $
which now allows for application of theorem 1.