Markov Chain Initial Distribution

8.7k Views Asked by At

Suppose $\{X_0,X_1,X_2,\dots\}$ is a discrete-time Markov chain taking values in a finite set $\{1,\dots,N\}$ with initial distribution $p_i(0) = P(X_0 = i)$ for $i\in\{1,\dots,N\}$ and transition probability matrix P where the $ij$-th entry $p_{ij} = P(X_k = j \mid X_{k-1} = i)$ for $k\geq 1$ and $i,j\in\{1,\dots,N\}$.

In order for the Markov chain to be well-defined, must we have that $P(X_0 = i)>0$ for all $i\in\{1,\dots,N\}$? If this were not true, then some elements of matrix $P$ would be undefined since the following definition is only valid for $P(X_0 = i)>0$. $$ p_{ij}=P(X_1 = j \mid X_0 = i) = \frac{ P(X_1 = j, X_0 = i) }{ P(X_0 = i) } $$

In some books dealing with discrete-time Markov chains, I have seen the author set the initial distribution to something like $p_i(0) = 1$ if $i=1$ and $p_i(0)=0$ if $i\neq 1$ to signify that the initial condition is known. Doesn't this cause the transition probability matrix $P$ to be invalid and the underlying probability space ill-defined?

1

There are 1 best solutions below

3
On BEST ANSWER

This is a good question, and a nice point to bring up. The truth is that when dealing with a time-homogenous Markov chain, the transition matrix $P$ is supposed to be intrinsic to the Markov chain without reference to a particular initial distribution. That is, you can set up the transition matrix without reference to some initial distribution given in a particular situation. To do so, you define $$ p_{ij} = P(X_1 = j~|~X_0=i) $$ with the assumption that $P(X_0=i)=1$. In other notation, $p_{ij} = P_i(X_1=j)$. This gives you the transition probability of going from state $i$ to state $j$ in one step under the condition that your chain is currently in state $i$, which is what you want the $p_{ij}$ to represent.

Let $S = \{1, 2, ..., N\}$ be the state space of the Markov chain with transition matrix $P$ defined as above with $p_{ij} = P_i(X_1=j)$. For $t = 0, 1, ...$ let $p^{(t)}_{i j} = P_{i}(X_t = j)$. That is $p^{(t)}_{ij}$ gives the probability of starting in state $i$ and moving to state $j$ after $t$ steps. You can show that $p^{(t)}_{ij}$ equals $[P^t]_{ij}$, the $(i,j)$th element of the transition matrix raised to the $t$th power. Now, suppose you're handed an initial distribution, which I'll call $\nu = (\nu_1, ..., \nu_N)$ where we interpret $\nu_i = P(X_0=i)$. From here, let's divide up the state space $S = S_1 \cup S_0$ where $i \in S_1 \iff \nu_i \neq 0$ and $i \in S_0 \iff \nu_i=0$. Let's figure out how to calculate $P(X_t = k)$ using this information. $$ P(X_t = k) = \sum_{i\in S} P(X_t=k,X_0=i) = \sum_{i\in S_1}P(X_t=k,X_0=i)+\sum_{j\in S_0}\overbrace{P(X_t=k,X_0=j)}^{=0} \\ =\sum_{i \in S_1} P(X_t=k|X_0=i)P(X_0=i)= \sum_{i \in S_1} p^{(t)}_{ik}\nu_i = \sum_{i \in S_1} p^{(t)}_{ik}\nu_i + \sum_{j \in S_0} p^{(t)}_{jk}\cdot 0 \\ = \sum_{i \in S_1} p^{(t)}_{ik}\nu_i + \sum_{j \in S_0} p^{(t)}_{jk}\nu_j = \sum_{i \in S} p^{(t)}_{ik}\nu_i = \left[\nu P^t \right]_{k} $$

where the last equality is $k$th element of the vector resulting from multiplying the row vector $\nu$ multiplying the matrix $P^t$.

The point is, we define $p_{ij} = P_i(X_1=j)$ and then we prove that the desired matrix equalities hold. At no point in time when deriving the formula, did we divide by zero!