It has been bothering for a while that I could not understand the proof of Markov Property in a rigorous manner. Here is the proof for Markov Property in the book of Markov Chains by Norris J.R
Theorem 1.1.2 (Markov property). Let $\left(X_n\right)_{n \geq 0}$ be $\operatorname{Markov}(\lambda, P)$. Then, conditional on $X_m=i,\left(X_{m+n}\right)_{n \geq 0}$ is $\operatorname{Markov}\left(\delta_i, P\right)$ and is independent of the random variables $X_0, \ldots, X_m$.
Proof. We have to show that for any event $A$ determined by $X_0, \ldots, X_m$ we have $$ \begin{aligned} & \mathbb{P}\left(\left\{X_m=i_m, \ldots, X_{m+n}=i_{m+n}\right\} \cap A \mid X_m=i\right) \\ & =\delta_{i i_m} p_{i_m i_{m+1}} \ldots p_{i_{m+n-1} i_{m+n}} \mathbb{P}\left(A \mid X_m=i\right) \\ & \end{aligned} $$ then the result follows by Theorem 1.1.1. First consider the case of elementary events $$ A=\left\{X_0=i_0, \ldots, X_m=i_m\right\} $$
I did not type the full proof above, and I have attached Theorem 1.1.1 below. In the first place, I don't really understand the stated theorem above. What does it mean for a sequence of random variables to be a Markov chain when conditional on something? I know the definition of a Markov chain, but I don't know how can I make the phrase conditional on $X_{m} = i$ rigorous. Is it the same meaning as given $X_{m} = i$, or does it has to do with a new probability measure of some sort? . Also for the proof, if we take A to be $\Omega$, the whole sample space, we get $\mathbb{P}(\{ X_{m} = i_{m}, \dots,X_{m+n} =i_{m+n}|X_{m} = i \}) = \delta_{i i_m} p_{i_m i_{m+1}} \ldots p_{i_{m+n-1} i_{m+n}}$. However, theorem 1.1.1 only states the case without the conditional $X_{m} = i$, so I am not sure proving such an equation above can prove $(X_{m+n})_{n \geq 0}$ is a Markov chain.
So what does conditional on $X_{m} = i$ mean rigorously in this case?
Theorem 1.1.1. A discrete-time random process $\left(X_n\right)_{0 \leq n \leq N}$ is $\operatorname{Markov}(\lambda, P)$ if and only if for all $i_1, \ldots, i_N \in I$ $$ \mathbb{P}\left(X_0=i_0, X_1=i_1, \ldots, X_N=i_N\right)=\lambda_{i_0} p_{i_0 i_1} p_{i_1 i_2} \ldots p_{i_{N-1} i_N} $$
I posted a new question that might be related to this question
This question is now resolved. Saying conditional on something literally means changing the probability measure, so we can use theorem 1.1.1 just fine with the new probability measure.
I'll assume that your Markov chain ha s afinite state space, $S=\{0,1,2\ldots,k\}$ and time index set $\mathbb Z_+=\{0,1,2,\ldots\}$. Consider the set $\Omega=S^{\mathbb Z_+}$ where $\mathbb Z$ is our sample path space. That is, $\omega\in\Omega$ is a full realization of the Markov chain. In other words, given some $\omega$, it is really a function of $n\in\mathbb Z_+$. That is, $\omega:\mathbb Z_+\to S$. Each $\omega$ is some particular fixed sample path and non-random.
When you see $X_m=i$, this is an "event." For us an event is a subset of the sample path space: $\{X_m=i\}\subset\Omega$. It really means this though: $$\{X_m=i\}=\{\omega\in\Omega \mid \omega(m)=i\}\subset\Omega.$$
So when we say $\mathbb P(X_m=i)$, we are really saying $\mathbb P(\{\omega\in\Omega \mid \omega(m)=i\})$. And by "$\mathbb P$" we mean a "probability measure." This also means that we have some valid set of events that we are allowed to calculate probabilities for, and hopefully $\{X_m=i\}$ is one of those events (it normally is). The set of events that we are allowed to calculate probaiblities for is called a "sigma algebra". We say that $\Omega$ plus the measure $\mathbb P$ plus the sigma algebra of events is a "measure space." You don't really need to know all that to do Markov chain calculations though.
A Markov chain is nice in the sense that once you know the state at any particular time, then it behaves form that time onward as a Markov chain with that particular initial state. So, conditions on $X_m=i$, we can treat time step $m$ as our new "initial time" and then then probabilities for timestep $m+1$ are just determined by (the $i^{th}$ row of) the transition matrix. I like to use the notation: $$(X_{m+n}|_{\{X_m=i\}})_{n\in\mathbb Z_+}.$$ One could also just say $Y_n=X_{m+n}$ and specify that $X_m=i$, and so $Y=(Y_n)_{n\in\mathbb Z_+}$ is just a Markov chain with initial state $i$ and the same transition matrix as $X=(X_n)_{n\in\mathbb Z_+}$. There are various other ways one could notate this stuff.
I hope this is helpful.