Can anyone explain me the reason why does any probability vector U0 converge to the same vector Uinfinity when a Markov matrix is acted on it?
I don't know how to put this but let me try.
If x1 and x2 are two Eigenvectors and the Eigenvalues are 1 and 0.75.
In the equation
U0 = C1 x1 + C2 x2
Why do I get the same value of C1 irrespective of any probability vector U0 to converge to the same steady state?. Hope this makes sense. Thanks!
To expand on Ian’s comments, you can view the action of a matrix on a vector in the following way: if $\mathbf v$ is an eigenvector of the matrix with eigenvalue $\lambda$, then multiplying a vector $w$ by the matrix multiplies the component of $w$ in the $v$-direction by $\lambda$. For a stochastic matrix, $|\lambda|\le1$, so repeated applications of the matrix will cause the components of the initial probability vector that are in the directions with $|\lambda|\lt1$ to shrink away. What’s left will be the components in directions where the eigenvalue is $\pm1$. For the simple type of Markov chain under consideration, the eigenspace is one-dimensional (and we won’t have $\lambda=-1$), so we’ll be left with the steady-state vector.
To see why the last statement might hold, consider the $2\times2$ case. We have $$P=\pmatrix{p&1-p\\q&1-q}$$ with eigenvalues $\lambda_1=1$ and $\lambda_2=p-q$. Observe that $|\lambda_2|\lt1$ unless $p=0$ and $q=1$ or vice-versa. The corresponding eigenvectors are $\mathbf v_1=\left({q\over1-p+q},{1-p\over1-p+q}\right)$ and $\mathbf v_2=(-1,1)$. (We’ve normalized $v_1$ so that its components sum to $1$.) Now, any probability vector $\mathbf v$ lies on the line segment joining $(1,0)$ to $(0,1)$, so it can be written in the form $\mathbf v=\mathbf v_1+\alpha\mathbf v_2$, so $P^n\mathbf v=\mathbf v_1+\lambda_2^n\alpha\mathbf v_2$, which clearly converges to $\mathbf v_1$ as $n\to\infty$.
The situation is similar for larger state spaces: probability vectors lie on the hyperplane $\mathbf v\cdot\mathbf 1=1$, and we can choose eigenvectors of $1$ which lie on this hyperplane. Any initial probability vector can be written as a linear combination of eigenvectors, and under repeated application of $P$ the components corresponding to $|\lambda|\lt1$ will “fade away,” leaving only the components for which $|\lambda|=1$.