Algebraically, the derivation of the probability function of a Poisson random variable is fairly straightforward. Letting $\mu = np$ with $n \to \infty, p \to 0$ so that $\mu < \infty$:
Starting with the intuitive:
$$P(X = x) = \binom{n}{x}p^x(1-p)^{n-x}$$
And ending up with seemingly, less obviously, intuitive:
$$P(X = x) = \frac{ e^{-\mu} \mu ^x }{ x! }$$
My Question
My question is, how can the Poisson probability function be understood intuitively so that it can be written immediately, just as the binomial.
My Attempt
I start off by breaking down the derivation:
First, $\mu = np$.
In finite terms, I understand this to scale $p$ up to the size of the population, so that whenever we draw $n$ samples, we expect that $\mu = np$ will be successes, for some definition of success. Analogously, having drawn infinite samples, $\mu = np$ seems to give us the expected number of successes. Without dependence on the sample size, I expect this to mean that regardless of the interval size, we always expect $\mu$ successes, which doesn't resonate too well with me.
The term $ \lim_{n \to \infty} (1-\mu / n)^n \sim e^{-\mu}$ seems to give the probability of failing infinite times.
I get more lost when I try to understand what the $\mu ^ x$ term means: why are we multiplying out the expected number of successes $x$ times?
I suspect that somewhere hidden in this compact equation lies a factor which accounts for all the different possible combinations of possible "orders of successes".
Maybe you're not looking for this, but my understanding of the heuristic was a bit more based on the algebra. If you start with $\mu = np$ with $n$ large (and $p$ small), then (for fixed $x$, very small compared with $n$) \begin{align*} \binom nx p^x(1-p)^{n-x} &= \frac{n!}{x!(n-x)!}\left(\frac\mu n\right)^x\left(1-\frac\mu n\right)^{n-x} \\ &= \frac{\mu^x}{x!}\underbrace{\left(1-\frac\mu n\right)^n}_{\approx e^{-\mu}} \underbrace{\frac{n(n-1)\dots (n-x+1)}{n^x}\left(1-\frac\mu n\right)^{-x}}_{\approx 1}. \end{align*}