how to prove that in a finite markov chain, a left eigenvector of eigenvalue 1 is a steady-state distribution?

1.2k Views Asked by At

The pic linked below is a part of the notes in MIT discrete-time stochastic process open course, I can't understand what was the intuitive idea that the theorem is trying to prove, nor how it approaches the result.

The slide I have question about: https://i.stack.imgur.com/Q0FtE.png

my questions are:

  • What does it mean by normalizing the vector pi?
  • How to interpret the result of πi being negative?
  • Why the absolute value of πi cannot be zero??
  • In the theorem, all the entries in π add up to one, but in the proof, it is the absolute value of π's entries add up to one, how are they related?

stuck on this for several days, my head is still a mess. please help!

in case you want to see the lecture notes: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-262-discrete-stochastic-processes-spring-2011/video-lectures/lecture-8-markov-eigenvalues-and-eigenvectors/MIT6_262S11_lec08.pdf

1

There are 1 best solutions below

2
On BEST ANSWER

A steady-state vector $\pi$ satisfies $\pi P = \pi$: if we start in the distribution $\pi$, after one step the new distribution $\pi P$ is the same as the old.

We can show that there exists some vector $v \ne 0$ such that $v P = v$. (The argument for this is probably on a previous page, but basically it's because the vector $j = (1,1,\dots,1)$ satisfies $Pj = j$, so $1$ is an eigenvalue of $P$.) However, a steady-state vector has additional properties:

  • Each $\pi_i$ is a probability, and satisfies $0 \le \pi_i \le 1$.
  • The sum of these probabilities is $\pi_1 + \pi_2 + \dots + \pi_n = 1$.

The goal of this theorem is to show that we can modify $v$ to have these properties as well. (I think that might be the key thing you're missing: we start with a vector $v$ that satisfies $vP = v$, but possibly not the additional properties, and use it to get a vector $\pi$ that satisfies $\pi P = \pi$ and also the additional properties.)


The argument in the proof shows that if $v = (v_1, v_2, \dots, v_n)$ satisfies $vP= v$, then $|v| = (|v_1|, |v_2|, \dots, |v_n|)$ also satisfies $|v|P = |v|$. We start with $v$ just being some nonzero vector that satisfies $vP = v$, so its entries might be negative - we fix that by replacing each $v_i$ with $|v_i|$.

Proving that this works is what requires $P$ to be a stochastic matrix, and is most of the slide. Let me try to explain it slightly differently:

From the equation $$v_i = \sum_{j=1}^n v_j P_{ji}$$ we get \begin{align} |v_i| &= \left|\sum_{j=1}^n v_j P_{ji}\right| \\ &\le \sum_{j=1}^n |v_j P_{ji}| & & \text{by the triangle inequality} \\ &= \sum_{j=1}^n |v_j| P_{ji}. \end{align} Equivalently, we can write $$|v_i| - \sum_{j=1}^n |v_j| P_{ji} \le 0. \tag{1}$$ But if we were to add up the left-hand side of this inequality over all $i$, we'd get $$\sum_{i=1}^n \left(|v_i| - \sum_{j=1}^n |v_j| P_{ji}\right)$$ in which the coefficient of each $|v_i|$ is $1 - P_{1i} - P_{2i} - \dots - P_{ni} = 0$, so the entire sum is $0$.

In other words, we're summing a bunch of terms $\le 0$, and getting $0$ as a total. This can only happen if each term is equal to $0$, so equality holds in $(1)$, which is just saying $|v|P = |v|$.

Next is the step called normalizing in the proof. The idea here is just to define $$\pi_i = \frac{|v_i|}{|v_1| + |v_2| + \dots + |v_n|}.$$ This definition ensures that $0 \le \pi_i \le 1$ for each $i$ and that $\pi_1 + \pi_2 + \dots + \pi_n = 1$: the additional properties we wanted from a steady-state vector.

We also don't lose the stationary property: if $|v|P = |v|$ and we define $\pi$ to be $|v|$ multiplied by the constant $\frac1{|v_1| + \dots + |v_n|}$, then $\pi P = \pi$.