Markov Property Confusion

842 Views Asked by At

I feel like I'm being very dense/employing some sort of circular reasoning, but I'm having trouble understanding the Markov Property. According to Durrett (ISBN-10:1461436141), $X_n$ is a Markov chain with transition matrix $p(i,j)$ if for any $j, i, i_{n-1}, \ldots, i_0$: $$P(X_{n+1}=j \mid X_n=i, X_{n-1}=i_{n-1}, \ldots, X_0=i_0)=p(i,j)$$

It seems to me that this property can never be satisfied. For example, suppose the transition matrix:

\begin{array}{c|ccc} & a & b & c \\ \hline a & 0 & 1 & 0 \\ b & 0.5 & 0 & 0.5 \\ c & 0 & 1 & 0 \end{array}

which is graphically represented as:

markov_chain

$P(X_3=c \vert X_2=b, X_1=b)=0 \neq p(b,c)=0.5$. I can similarly do this to any chain and have every probability be equal to $0$. Is this because I'm conditioning on an event with 0 probability? Or is it because I should be saying $$\sum_{x \in \{a,b,c\}} P(X_3=c \mid X_2=b, X_1=x)=0.5=p(b,c)$$

Or am I just thinking about it completely wrong?

1

There are 1 best solutions below

9
On BEST ANSWER

This is a very good question and it should deserve a lot more upvotes.

To answer your question I need to go a little bit probability theory theoretical. The invalid combinations you give are not members of the sample space $\Omega$. So you cannot condition on them, since they are not a member of our set of possible outcomes (also called sample paths). Why are they not a member of the set of possible outcomes? Since we just chose them not to be valid outcomes to let our markov property work. If you chose them to be a member of the sample space, then the Markov property doesn't work as you pointed out nicely.

So we just define our Markov property on a sample space $\Omega$ which does not include invalid outcomes. Something that is not a subset of the sample space is something you can't condition on. Remember, conditioning is taking subsets.

How does our desired sample space look like? Well, all valid sequences of a certain length of random variables $X_n$ are members of the sample space for that length $n$. (We need to define a sample space for each sequence length of random variables, so we have a lot of sample spaces in fact and we just use the sample space of the correct length when asking probabilities in that sample space with our probability measure). And for each of these sample spaces, we can find ask the probability of something in our sample space with our probability measure P. We can ask for instance in the sample space of sequences of length 3 the measure of $P(X_3 = c | X_2 =b, X_1 = a)$.

So we need define a sample spaces for every sequence length we want to find probability measures of. And every time this sample space contains all possible combinations of paths up to that length.

You could say that the probability of $P(X_3 = c|X_2 = b,X_1 = x)$ is undefined in the sample space of length 3 for which the markov property holds.