Since i did not find a good proof of this anywhere on the internet, I want somebody to check my proof:
An experiment runs over a specific time span $[0,t]$. The expected number of arrivals in this time interval is $\lambda$, and the time intervals in between 2 arrivals has exponential distribution (it does not matter when the last arrival took place, the probability distribution until the next one is always the same). E.g.: number of radioactive decays of a sample in a given time interval, number of calls arriving, ...
Let $X$ be the number of arrivals in this time interval $[0,t]$. Then $X$ has the following probability distribution: $$ P_{[0,t]}(X= k) = \frac{\lambda^k}{k!} e^{-\lambda}. $$
Proof:
If we slice the interval in $n\gg0$ equal parts: $[0,t] = [0,t/n)\cup [t/n, 2\cdot t/n) \cup \dots \cup [(n-1)\cdot t/n, t]$ Let $Y_1, Y_2, \dots, Y_n$ be the random variables describing the number of arrivals in the time intervals $[0,t/n), [t/n, 2t/n), \dots, [(n-1)t/n, t]$.
Since the time from a specific point until the first arrival has exponential distribution, $P(Y_1 = 1) = P(Y_2 = 1) = \dots = P(Y_n = 1)$. Also, the expected number of arrivals in a subinterval is proportional to the length of the interval, therefore: $P(Y_1 = 1)= \dots =P(Y_n = 1) = \frac \lambda n$. Because there can only be one arrival at the same time, $n$ can be chosen so big, that there will always be only zero or one arrivals in every subinterval.
Now one can see, that the probability of $k$ events in the interval $[0,t]$ can be written as a binomial distribution:
\begin{align} P_{[0,t]}(X = k) &= \binom{n}{k} \left(\frac{\lambda}{n}\right)^k \left(1-\frac{\lambda}{n} \right)^{n-k} = \\ &= \frac{n(n-1)\cdots (n-k+1)}{k!} \frac{\lambda^k}{n^k} \left(1-\frac{\lambda}{n}\right)^n\left(1-\frac{\lambda}{n}\right)^{-k}, \end{align}
since, $\lim_{n\rightarrow \infty} \frac{n(n-1)\cdots(n-k+1)}{n^k} = 1$, $\lim_{n\rightarrow \infty} \left( 1-\frac{\lambda}{n}\right)^n = e^{-\lambda}$ and $\lim_{n \rightarrow \infty} \left(1-\frac{\lambda}{n}\right)^{-k} = 1$:
\begin{align} P_{[0,t]}(X=k) = \frac{\lambda^k}{k!} e^{-\lambda}. \end{align}
(I think it should be mostly correct, put please can anyone check this)
You wrote:
That is infelicitous phrasing. If $X$ is a random variable, then what is meant by the number of "counts" of $X$ in a time interval $[0,t]$? Would you expect someone who had never heard of Poisson processes to understand that? What if $X$ is a random variable and $X\sim N(0,1)$? What then is the "number of counts of $X$" is a time interval? Saying "If $X$ is a random variable" means that what follows is applicable whenever $X$ is any random variable, whereas here you want $X$ to be a "number of counts"; you don't want to say for any random variable (e.g. a continuously distributed one) "has" a "number of counts".
What you call "events" I'd rather call "arrivals" since "event" is an overworked word in this context.
What you need here is to say that
That last assumption does not follow from the others: sometimes one includes a probability distribution of the number of arrivals at a time when arrivals occur.
I also wouldn't say "$X$ has no memory". The thing that "has no memory" is the probability distribution of the time from one arrival to the next, and that is the probability distribution of a continuous random variable, whereas $X$ is a discrete random variable. "Memorylessness" in this context has a precise definition. There is discrete memorylessness of a discrete distribution, but that is a geometric distribution, not a Poisson distribution.
"$\frac\lambda n \ll0$" is not correct, since that implies $\lambda/n$ is negative.
Your way of finding the limit is correct. I would also mention that $\left( 1 - \frac \lambda n \right)^{-k} \to 1$ as $n\to\infty.$