I have two questions.
Since $o(h)$ represents the probability of 2 birth and 1 death, 3 birth and 2 death, etc, why it still says $P(|X(t+h)-X(t)|>1)=o(h)$? shouldn't it be $P(|X(t+h)-X(t)|=1)=o(h)$?
how to get the equation 13 from equation 12?
I have two questions.
Since $o(h)$ represents the probability of 2 birth and 1 death, 3 birth and 2 death, etc, why it still says $P(|X(t+h)-X(t)|>1)=o(h)$? shouldn't it be $P(|X(t+h)-X(t)|=1)=o(h)$?
how to get the equation 13 from equation 12?
Copyright © 2021 JogjaFile Inc.

First of all, what does $o(h)$ (it is a class of functions, called Landau or Big- O notation) actually mean? Generally it means for a function $f\in o(g)$ that we have $$ {\displaystyle \lim _{x\to a}\left|{\frac {f(x)}{g(x)}}\right|=0} $$ As an example, take a differentiable function $f$, then we have for $h\to0$ the following $$ f(x+h)=f(x)+hf'(x)+o(h) $$
this means, that the approximation error goes faster than linear to $0$.
Your $2$nd question was already answered by @Math1000, for completeness:
Since we investigate the behavior of $h$ close to $0$, we indeed have $$ h^2\ll h \text{ respectively for }n\geq2: h^n\ll h $$ so for $h\to0$, all higher powers $h^n$ are of neglectable magnitude compared to $h$ and therefore we get $$ (\lambda_ih)(1-h(\lambda_i+\mu_i)+\ldots)+o(h)=\lambda_ih+o(h) $$ Your $1$st question:
The expression $$ P(|X(t+h)-X(t)|>1|X(t)=i)=o(h) $$ means, that for sufficient small h ($\to 0$), the probability that there occurs a jump in each direction larger of magnitude larger than $1$ is converging to $0$ faster than linear and for very small $h$ neglectable.
The bottom line is, that a Birth-Death process only jumps by $\pm1$ into the next state.