First some background:
We say that $(X_n)_{n\geq 0}$ is a Markov chain with initial distribution $\lambda$ and transition matrix $P$ if (i) $X_0$ has distribution $\lambda$; (ii) for $n\geq 0$, conditional on $X_n=i$, $X_{n+1}$ has distribution $(p_{ij}: j\in I)$ and is independent of $X_0,\ldots,X_{n-1}$. More explicitly, these conditions state that, for $n\geq 0$, and $i_0,\ldots,i_{n+1}\in I$, (i) $\mathbb{P}(X_0=i_0)=\lambda_{i_0}$, (ii) $\mathbb{P}(X_{n+1}=i_{n+1}|X_0=i_0,\ldots,X_n=i_n)=p_{i_n i_{n+1}}$. We say that $(X_n)_{n\geq 0}$ is Markov($\lambda,P$) for short.
Now the following Theorem is shown:
A discrete-time random process $(X_n)_{0\leq n\leq N}$ is Markov($\lambda,P$) iff for all $i_0,\ldots,i_N\in I$ $$ \mathbb{P}(X_0=i_0,X_1=i_1,\ldots,X_N=i_N)=\lambda_{i_0}p_{i_0i_1}p_{i_1i_2}\ldots p_{i_{N-1}i_N}.~~~~~~~(*) $$
Now the proof is given. I do understand the first direction:
$\Rightarrow$ Suppose $(X_n)_{0\leq n\leq N}$ is Markov($\lambda,P$). Then one can write $$ \mathbb{P}(X_0=i_0,X_1=i_1,\ldots,X_N=i_N)\\=\mathbb{P}(X_N=i_N|X_0=i_0,\ldots,X_{N-1}=i_{N-1})\cdot\mathbb{P}(X_0=i_0,\ldots,X_{N-1}=i_{N-1})\\=p_{i_{N-1}i_N}\cdot\mathbb{P}(X_{N-1}=i_{N-1}|X_0=i_0,\ldots X_{N-2}=i_{N-2})\cdot\mathbb{P}(X_0=i_0,\ldots,X_{N-2}=i_{N-2})\\=\ldots\\=p_{i_{N-1}i_N}\ldots\mathbb{P}(X_0=i_0)\mathbb{P}(X_1=i_1|X_0=i_0)\\=\lambda_{i_0}p_{i_0i_1}\ldots p_{i_{N-1}i_N} $$
But I do not understand the other direction. I cite it:
On the other hand, if $(*)$ holds for $N$, then by summing both sides over $i_N\in I$ and using $\sum_{j\in I}p_{ij}=1$ we see that $(*)$ holds for $N-1$ and, by induction $$ \mathbb{P}(X_0=i_0,X_1=x_1,\ldots,X_n=i_n)=\lambda_{i_0}p_{i_0i_1}\ldots p_{i_{n-1}i_n} $$ for all $n=0,\cdots,N$. In particular, $\mathbb{P}(X_0=i_0)=\lambda_{i_0}$ and, for $n=1,1,\cdots,N-1$, $$ \mathbb{P}(X_{n+1}=i_{n+1}|X_0=i_0,\ldots,X_n=i_n)\\=\mathbb{P}(X_0=i_0,\ldots,X_n=i_n,X_{n+1}=i_{n+1})/\mathbb{P}(X_0=i_0,\ldots,X_n=i_n)\\=p_{i_ni_{n+1}}. $$ So $(X_n)_{0\leq n\leq N}$ is Markov($\lambda,P$).
Could you please explain me the proof of direction $\Leftarrow$?
I do not understand f.e. what is meant by
[...] if $(*)$ holds for $N$, then by summing both sides over $i_N\in I$ and using $\sum_{j\in I}p_{ij}=1$ we see that $(*)$ holds for $N-1$ and, by induction [...]
Edit 1: I think this means $$ \sum_{i_N\in I}\mathbb{P}(X_0=i_0,\ldots,X_N=i_N)=\sum_{i_N\in I}\lambda_{i_0}p_{i_0i_1}\ldots p_{i_{N-1}i_N}\\=\lambda_{i_0}p_{i_0i_1}\ldots p_{i_{N-2}i_{N-1}}\underbrace{\sum_{i_N\in I}p_{i_{N-1}i_N}}_{=1}\\=\lambda_{i_0}p_{i_0i_1}\ldots p_{i_{N-2}i_{N-1}} $$
But why does this show that $(*)$ holds for $N-1$? Cannot see that.
Edit 2: I think this is because of the law of total probability, i.e. summing over $i_N\in I$ gives on LHS $$ \sum_{i_N\in I}\mathbb{P}(X_0=i_0,\ldots,X_N=i_N)=\sum_{i_N\in I}\mathbb{P}(X_0=i_0,\ldots,X_{N-1}=i_{N-1}|X_N=i_N)\mathbb{P}(X_N=i_N)\\=\mathbb{P}(X_0=i_0,\ldots,X_{N-1}=i_{N-1}) $$ and on RHS (as ritten above) $\lambda_{i_0}p_{i_0i_1}\ldots p_{i_{N-2}i_{N-1}}$, so we have $$ \mathbb{P}(X_0=i_0,\ldots,X_{N-1}=i_{N-1})=\lambda_{i_0}p_{i_0i_1}\ldots p_{i_{N-2}i_{N-1}}. $$ Now with the same reasoning one can show that $(*)$ holds for $n=0,\ldots,N-2$.
Let $$P(X_0=i_0,X_1=i_1,...,X_N=i_N)=\lambda_{i_0}p_{i_0i_1}p_{i_1i_2}...p_{i_{N-1}i_{N}}$$ hold true for all $N$. Then it must be true that $$P(X_0=i_0,X_1=i_1,...,X_{N-1}=i_{N-1})=\lambda_{i_0}p_{i_0i_1}p_{i_1i_2}...p_{i_{N-2}i_{N-1}}$$
But $$P(X_N=i_N|X_0=i_0,...,X_{N-1}=i_{N-1})=\frac{P(X_0=i_0,X_1=i_1,...,X_N=i_N)}{P(X_0=i_0,X_1=i_1,...,X_{N-1}=i_{N-1})}=\frac{\lambda_{i_0}p_{i_0i_1}p_{i_1i_2}...p_{i_{N-1}i_{N}}}{\lambda_{i_0}p_{i_0i_1}p_{i_1i_2}...p_{i_{N-2}i_{N-1}}}=p_{i_{N-1}i_N}$$ for all $N$. And this is what Markov chain is.