I'm reading about two equivalent definitions of a Markov chain from this page.
As usual, our starting point is a probability space $(\Omega, \mathscr{F}, \mathbb{P})$, so $\Omega$ is the sample space, $\mathscr{F}$ the $\sigma$-algebra of events, and $\mathbb{P}$ the probability measure on $(\Omega, \mathscr{F})$. Let $\boldsymbol{X}=\left(X_0, X_1, X_2, \ldots\right)$ be a stochastic process defined on the probability space, with time space $\mathbb{N}$ and with countable state space $S$. In the context of the general introduction, $S$ is given the power set $\mathscr{P}(S)$ as the $\sigma$-algebra.
The vector space $\mathscr{B}$ consisting of bounded functions $f: S \rightarrow \mathbb{R}$ will play an important role. The norm that we use is the supremum norm defined by $$ \|f\|=\sup \{|f(x)|: x \in S\}, \quad f \in \mathscr{B}. $$
For $n \in \mathbb{N}$, let $\mathscr{F}_n=\sigma\left\{X_0, X_1, \ldots, X_n\right\}$, the $\sigma$-algebra generated by the process up to time $n$. Thus $\mathfrak{F}=\left\{\mathscr{F}_0, \mathscr{F}_1, \mathscr{F}_2, \ldots\right\}$ is the natural filtration associated with $\boldsymbol{X}$.
Definition 1 $\boldsymbol{X}=\left(X_0, X_1, X_2, \ldots\right)$ is a Markov chain if either of the following equivalent conditions is satisfied: a. $\mathbb{P}\left(X_{n+1}=x | \mathscr{F}_n\right)=\mathbb{P}\left(X_{n+1}=x | X_n\right)$ a.s. for every $n \in \mathbb{N}$ and $x \in S$. b. $\mathbb{E}\left[f\left(X_{n+1}\right) | \mathscr{F}_n\right]=\mathbb{E}\left[f\left(X_{n+1}\right) | X_n\right]$ a.s. for every $n \in \mathbb{N}$ and $f \in \mathscr{B}$.
Definition 2 $\boldsymbol{X}=\left(X_0, X_1, X_2, \ldots\right)$ is a Markov chain if for every $n \in \mathbb{N}$ and every sequence of states $\left(x, x_n, \ldots, x_0\right)$ such that $\mathbb P (X_n=x_n , \ldots, X_0=x_0) > 0$, we have $$ \begin{align*} &\mathbb{P}\left(X_{n+1}=x | X_n=x_n , \ldots, X_0=x_0\right) \\ = &\mathbb{P}\left(X_{n+1}=x | X_n=x_n\right). \end{align*} $$
In below failed attempt, I'm stuck at showing $\frac{\mathbb E [Z_2 1_{B}]}{\mathbb P(B)} = \mathbb P (B_{n+1}|B_n)$. Could you elaborate on how to finish the proof?
My attempt
- Definition 1 implies Definition 2.
Let $B_{n+1} = \{X_{n+1} = x\}$. Let $B_i := \{X_i=x_i\}$ for all $i=0, \ldots,n$ and $C := \bigcap_{i=0}^n B_i$. We want to prove $$ \mathbb P (B_{n+1}|C) = \mathbb P (B_{n+1}|B_n). $$
Let $$ \begin{align} Z_1 &:= \mathbb{P}\left(X_{n+1}=x | \mathscr{F}_n\right) = \mathbb E[1_{B_{n+1}}| \mathscr{F}_n], \\ Z_2 &:=\mathbb{P}\left(X_{n+1}=x | X_n\right) = \mathbb E[1_{B_{n+1}}| \sigma(X_n)]. \end{align} $$
By Definition 1, $Z_1 = Z_2$ a.s. Notice that $C\in \mathscr{F}_n$. By definition of conditional expectation, $$ \begin{align} \mathbb E [Z_1 1_{C}] &= \mathbb E [1_{B_{n+1}} 1_C] = \mathbb P(B_{n+1} \cap C) \\ &= \mathbb P (B_{n+1}|C) \mathbb P(C). \end{align} $$
It suffices to prove that $\frac{\mathbb E [Z_1 1_{C}]}{\mathbb P(C)} = \mathbb P (B_{n+1}|B_n)$, or equivalently $\frac{\mathbb E [Z_2 1_{C}]}{\mathbb P(C)} = \mathbb P (B_{n+1}|B_n)$.
I have found an answer from this thread by @saz. For the sake of my own understanding, I re-present his/her ideas below.
For $A,B \in \mathscr{F}$, we define $\mathbb P (A|B) := \frac{\mathbb P(A \cap B)}{\mathbb P (B)}$ if $\mathbb P (B)>0$ and $0$ otherwise.
Fix $x \in S$ and $n \in \mathbb N$. By our Lemma, $$ \begin{align} \mathbb{P} (X_{n+1}=x | \mathscr{F}_n) &= \psi(X_0, \ldots, X_n) ,\\ \mathbb{P} (X_{n+1}=x | X_n) &= \varphi(X_n), \end{align} $$ for some measurable functions $\psi:S^{n+1} \to \mathbb R$ and $\varphi:S \to \mathbb R$ such that $$ \begin{align} \psi(x_0, \ldots, x_n) &= \mathbb P(X_{n+1}=x | X_n=x_n, \ldots, X_0=x_0), \\ \varphi(x_n) &= \mathbb P(X_{n+1}=x | X_n=x_n), \end{align} $$ for all $x_0, \ldots, x_n \in S$.
Let $\mathbb P (X_n=x_n , \ldots, X_0=x_0) > 0$. By Definition 1, $\psi(X_0, \ldots, X_n) = \varphi(X_n)$ a.s. and thus $$ 1_{\{X_n=x_n , \ldots, X_0=x_0\}} \psi(X_0, \ldots, X_n) = 1_{\{X_n=x_n , \ldots, X_0=x_0\}}\varphi(X_n) \quad \text{a.s}. $$
This implies $$ \mathbb E[1_{\{X_n=x_n , \ldots, X_0=x_0\}} \psi(X_0, \ldots, X_n)] = \mathbb E[1_{\{X_n=x_n , \ldots, X_0=x_0\}}\varphi(X_n)]. $$
So $$ \mathbb P (X_n=x_n , \ldots, X_0=x_0) \psi(x_0, \ldots, x_n) = \mathbb P (X_n=x_n , \ldots, X_0=x_0) \varphi(x_n). $$
Because $\mathbb P (X_n=x_n , \ldots, X_0=x_0)>0$, we get $\psi(x_0, \ldots, x_n) = \varphi(x_n)$.
We want to prove $\psi(X_0, \ldots, X_n) = \varphi(X_n)$ a.s. It suffices to prove that $\psi(X_0, \ldots, X_n) = \varphi(X_n)$ outside a null set. It suffices to prove $\psi(x_0, \ldots, x_n) = \varphi(x_n)$ for all $x_0, \ldots, x_n \in S$ such that $\mathbb P (X_n=x_n , \ldots, X_0=x_0) > 0$. However, this trivially follows from Definition 2.