I'm reading a probability theory book, which (slightly reworded) says the following:
A Markov process is completely determined once we know $$P_{ij}^{n,n+1} = P\{X_{n + 1} = j \mid X_{n} = i\} $$ and $X_{0}$'s initial value. We prove this fact as follows:
Let $P\{X_{0} = i\} = p_{i}$. It will suffice to show how to compute $$P\{X_{0} = i_{0}, X_{1} = i_{1}, \ldots, X_{n} = i_{n}\},$$ since any probability involving $X_{j_{1}}, \ldots X_{j_{k}}$, $j_{1} < j_{2} < \cdots < j_{k}$ may be obtained.
We have by the definition of conditional probability that
$$P\{X_{0} = i_{0}, X_{1} = i_{1}, X_{2} = i_{2}, \ldots, X_{n} = n\}$$ $$= P\{X_{n} = i_{n} \mid X_{0} = i_{0}, X_{1} = i_{1}, \ldots, X_{n - 1} = i_{n - 1}\} \cdot P(X_{0} = i_{0}, X_{1} = i_{1}, \ldots, X_{n - 1} = i_{n - 1}\}.$$ Then, by the definition of a Markov process,
$$P\{X_{n} = i_{n} \mid X_{0} = i_{0}, X_{1} = i_{1}, \ldots, X_{n - 1} = i_{n - 1}\} \\ = P(X_{n} = i_{n} \mid X_{n - 1} = i_{n - 1}\} = P_{i_{n - 1}, i_{n}}$$
Finally, we get $$P(X_{0} = i_{0}, X_{1} = i_{1}, \ldots, X_{n} = i_{n} = P_{i_{n} - 1, i_{n}}) \cdot P(X_{0} = i_{0}, X_{1} = i_{1}, \ldots, X_{n - 1} = i_{n - 1}),$$ and if we proceed by induction, $$P\{X_{0} = i_{0}, X_{1} = i_{1}, \ldots, X_{n} = i_{n}\} = P_{i_{n - 1}, i_{n}} P_{i_{n - 2}, i_{n - 1}} \cdots P_{i_{0}, i_{1}} p_{i_{0}}$$
My questions
What does it mean for a process to be "determined"? I think this means that we know the probability to get from one state to another state for every pair of states.
Why does it suffice to show how to compute this probability? This isn't really too clear for me.
How did the equality after "by the definition of conditional probability" follow? This also isn't clear to me.
(1) What does it mean for a process to be "determined"? I think this means that we know the probability to get from one state to another state for every pair of states.
That the process is completely determined means that we know the distribution $P(X_t = x)$ for every $x$ in the state space and $t > 0$. The intuition behind the proof is that we know transition probabilities at all points of time and the initial distribution, so we can just use these two pieces of information to propagate the distribution forward in time.
(2) Why does it suffice to show how to compute this probability? This isn't really too clear for me.
As mentioned earlier, the process is "determined" if we know $P(X_t = x)$ for all $x$ and $t > 0$. However, by the law of total probability, we can decompose this event by considering all paths starting from $t = 0$ that end at $x$ at time $t$. Therefore, if we can compute the probability of any path with the given information, this would then imply the desired result.
(3) How did the equality after "by the definition of conditional probability" follow? This also isn't clear to me.
Remember that for any two events $A, B$, $P(A, B) = P(A | B)P(B)$. In this case, $A$ is the event that $X_n = i_n$, and $B$ is the particular path that was taken to get to $X_n$.