Quoting Bertsekas' Introduction to Probability:
An arrival process is called a Poisson process with rate $\lambda$ if it has the following properties:
a) Time homogenity - the probability $P(k,\tau)$ of $k$ arrivals is the same for all intervals of the same length $\tau$
b) The number of arrivals during a particular interval is independent of the history of arrivals outside this interval.
c) Small interval probabilities - The probabilities $P(k,\tau)$ satisfy:
$P(0,\tau)=1-\lambda\tau + o(\tau)$
$P(1,\tau)=\lambda\tau + o_1(\tau)$
$P(k,\tau)=o_k(\tau)$ for $k=2,3,...$
Here, $o(\tau)$ and $o_k(\tau)$ are functions of $\tau$ that satisfy
$\mathbb{lim}_{r\to0}\frac{o(\tau)}{\tau}=0$, $\mathbb{lim}_{r\to0}\frac{o_k(\tau)}{\tau}=0$
Then we are given the formula:
$$P(k,\tau)=e^{-\lambda\tau}\frac{(\lambda\tau)^k}{k!}$$
Note that a Taylor series expansion of $e^{-\lambda\tau}$ yields:
$P(0,\tau)=e^{-\lambda\tau}=1-\lambda\tau+o(\tau)$
$P(1,\tau)=\lambda\tau e^{-\lambda\tau}=\lambda\tau-\lambda^2\tau^2+O(\tau^3)=\lambda\tau+o_1(\tau)$.
First of all, what are $o(\tau)$, $o_1(\tau)$ and $O(\tau)$ in the Taylor expansion? Does it have anything to do with Taylor expansion per se? I thought that $o$ is the little-o notation, but its definition is quite different - $ f(n) = o(g(n))$ if $g(n)$ grows much faster than $f(n)$. In this case, it's quite different. Then what is it?
Secondly, the author doesn't prove that the $o$ terms above satisfy
$\mathbb{lim}_{r\to0}\frac{o(\tau)}{\tau}=0$, $\mathbb{lim}_{r\to0}\frac{o_k(\tau)}{\tau}=0$
as stated in the definition of Poisson process. How can we prove it?
Most importantly - why do we want it to satisfy the properties described in 'c) Small interval probabilities'? These 3 formulas are not arbitrary, there has to be a good reason for them.
Ideally, if we let $\lambda \to 0$ and it's natural to expect that the probability $P(k,\tau)$ to equal exactly $0$ in the limit, but apparently it's not possible (there will always be that tiny number, $o_k(\tau)$). Or does it equal $0$ in the limit?

$o$ and $o_k$ are the authors functions with the properties that: $$\lim_\limits{r\to0}\frac{o(\tau)}{\tau}=0$$
$$\lim_\limits{r\to0}\frac{o_k(\tau)}{\tau}=0$$
The actual definition of the functions is left undefined, and the author has said that they behave as above.
As these functions tends to zero, they allow for very minor variations to be treated as negligible. For the purpose of this section of the book, we do not need other properties.
The big-O notation used here:
$$P(1,\tau)=\lambda\tau e^{-\lambda\tau} =\lambda\tau-\lambda^2\tau^2+O(\tau^3) =\lambda\tau+o_1(\tau)$$
means that $|\lambda\tau e^{-\lambda\tau}-\lambda\tau+\lambda^2\tau^2|$ is smaller than $M|\tau^3|$ for some constant $M$ and as $\tau\to0$.
Then we can let:
$$o_1(\tau)=\lambda^2\tau^2+O(\tau^3)$$
because this satisfies the limit property the author has defined.
By the definition of probabilities we have:
$$\sum_k P(k,\tau)=1$$
and if $\lambda,\tau\to0$, when $k=0$ we have $0^0=1$.
The final point is that $o(\tau)$ becomes very trivial in relation to the other calculations, and can be ignored.