Operating a transition matrix on the "wrong" side.

306 Views Asked by At

In Markov chain theory, it is customary to denote a probability vector as a row vector $$ x^{(k)}=\begin{bmatrix}p^{(k)}_1 & p^{(k)}_2 & \dots & p^{(k)}_n\end{bmatrix} $$ and the transition matrix $A=[a_{ij}]_{n\times n}$, where $a_{ij}=\Bbb P(x^{(k+1)}=j|x^{(k)}=i)$.

With this convention it follows that $$ x^{(k+1)}=x^{(k)}A, $$ i.e. we need to operate the matrix on the right, which is different that other discipline which prefer to operate a linear transformation on the left. Here's my question:

What does operating a transition matrix on the left signify?

What I mean is that let $A$ be a transition matrix in the above sense but now for any column vector $y$, we operate $A$ on the left to get $Ay$. Is there anything interesting we can say about $Ay$?

For example, since $A$ is a transition matrix we know that $\sum_{j=1}^n A_{ij} =1$, thus $$ A\begin{bmatrix}1 \\ 1 \\ \vdots \\ 1\end{bmatrix}=\begin{bmatrix}1 \\ 1 \\ \vdots \\ 1\end{bmatrix} $$ so $A$ has $1$ as an eigenvalue, with $\begin{bmatrix}1 & 1 & \dots & 1\end{bmatrix}^T$ as its eigenvector.

In the same spirit, is there any special property when viewing $A$ as a bilinear form $$ \langle Ay,x\rangle = x^TAy $$ where $x,y$ are column vectors?

1

There are 1 best solutions below

6
On BEST ANSWER

Suppose $(X_k)$ is a Markov chain on $n$ states with initial state $i$ and transition probabilities given by $A$. Then for any $y\in\mathbb R^n$, $$E(y_{X_k})=(A^ky)_i$$ In particular, $E(y_{X_1})=(Ay)_i$. This is best understood when considering $y$ as a function of the state space, so $y_{X_1}$ can be thought of as a test statistic of the Markov chain. With this perspective, it is hardly surprising that $[1\,1\,\ldots\,1]^T$ is a $1$-eigenvector; it corresponds to the constant function $1$, which obviously has mean $1$ regardless of initial state.

As for the bilinear form, in a similar fashion we have that $$x^TAy=E(y_{X_1})$$ where $(X_k)$ is still a Markov chain with transition probabilities $A$, but now the distribution of the initial state is $x^T$ (i.e. $P(X_0=i)=x_i$). This is again a very natural quantity to consider, although it is usually sufficient to consider Markov chains with a specific initial state.

Note: I'm not sure how much advanced math you know, so feel free to disregard anything below this point!

It might seem like operating on probability vectors is a more natural thing to do, and so one might wonder why we don't simply flip our definition of stochastic matrices and have things the other way around. The reason becomes clear when you stop assuming a finite state space. If you have a Markov chain $(X_k)$ on a (topological) state space $S$ with transition kernel $\Pi=(\pi(x,\cdot))_{x\in S}$, then it is natural to consider $$Pf(x):=\int f(y)\pi(x,dy)=E^x(f(X_1))$$ for bounded Borel functions $f$, where $E^x$ denotes expectation with respect to the chain started at $x$. This corresponds exactly to $Ay$ if $|S|<\infty$. If $S$ is locally compact and Hausdorff regular, then we are particularly interested in the case where $Pf\in C_0(S)$ for $f\in C_0(S)$, where $C_0(S)$ is the space of continuous functions which are zero at infinity. Tthe dual operator $P'$ acts on probability distributions (or more generally, finite signed Radon measures) via $$\langle P'\mu,f\rangle=\langle\mu,Pf\rangle=\int Pf(x)\mu(dx).$$ In the case where $|S|<\infty$, $P'\mu$ and $\langle P'\mu,f\rangle$ correspond respectively to $xA$ and $xAy$.