I've seen the formula most commonly derived as a continuum generalization of a binomial random variable with large $n$, small $p$ and finite $\lambda = np$ yielding
$$ \lim_{n \to \infty} \binom{n}{x} p^x(1-p)^{n-x} = e^{-\lambda}\frac{\lambda ^ x}{x!}$$
It follows, from this derivation, that $$ \lim_{n \to \infty } = (1-p)^{n-x} = e^{-\lambda}$$ yields the probability of failing infinitely many times when the success rate is $\lambda$.
However, from this approach, I could not grok the remaining term
$$\frac { \lambda ^ x } {x!} $$
Question
What insightful derivations (perhaps, from generalizations) of the Poisson random variable exist which leaves an intuition for each of the terms?
My Answer:
My answer, https://math.stackexchange.com/a/2727388/338817 comes from geometric approach to Gamma function intuition (https://math.stackexchange.com/a/1651961/338817) which I quote:
Note that $\frac{t^n}{n!}$ is the volume of the set $S_t=\{(t_1,t_2,\dots,t_n)\in\mathbb R^{n}\mid t_i\geq 0\text{ and } t_1+t_2+\cdots+t_n\leq t\}$
Suppose $k$ successes occur in an interval $[0, t)$ and let their times be given by the $k$-tuple $(t_1, \dots, t_k), t_i \leq t$.
The set of events where exactly $n$ successes occur can be measured as $$ \int_0^{t} \int_0^{t - x_1} \cdots \int_0^{ t - \sum_{i = 1}^{n-1} x_i } \int_0^{ t - \sum_{i = 1}^{n} x_i } dx_n dx_{n-1} dx_{n-2} \dots dx_2 dx_1 = \frac{ t^n } { n! }$$
Importantly, the size of the sample space of all events is measured by considering the size of all possible $k$-tuples, $\forall n \geq 0$:
$$ \sum_{k = 0}^{\infty} \frac{ t^k }{ k! } = e^t$$
Taking the ratio of these size of these sets yields the probability that $n$ events occur in the interval $[0, t)$.
$$\boxed{ P \{ X = n \} = e^{-t} \frac{ t^n }{ n! } }$$
Note:
More generally, the event rate can be made non-homogeneous with a scalar function $\lambda(t)$. When the rate is constant for all time, i.e, $\lambda(t) = \lambda$, we write
$$P(X = n) = e^{-\lambda t}\frac{ (\lambda t)^n } { n! }$$
Letting $t = 1$ gives the process on a unit time interval, scaled by $\lambda$. Although we're really interested in $[0, 1)$, it's really as if we're looking at the interval $[0, \lambda)$.