Necessary Assumptions when Deriving Poisson Distribution

76 Views Asked by At

Poisson distribution expresses the probability that a specific number of discrete independent events happen over a fixed time interval, as long as the events are sufficiently rare.

To be precise, I accept the following premises for Poisson events:

  1. Probability of exactly $k$ events happening over a time interval $[t_1, t_2]$ depends only its length $\Delta t = t_2-t_1$.

  2. Probability distributions for any two non-overlapping intervals are independent.

  3. Expected value of the number of events happening over the period $T$ is finite.

  4. No two events can happen at exactly the same time.

However, when searching for derivation of Poisson distribution from the first principles online, I saw all of them make the following extra assumption:

For sufficiently small interval $\mathrm{d}t$ probability of exactly one event occurring equals to $\mu \mathrm{d}t$, and we can neglect the probability that two or more events will occur.

See for example this, this, and this article.

Do we really need this additional assumption? Could it be that we would have a distribution of rare, independent events, with constant probability of happening over time, that would somehow violate the above assumption?

1

There are 1 best solutions below

0
On BEST ANSWER

The extra assumption is not necessary and can be derived from the four assumptions you gave.

We will use following notation in our proof:

  • $T$, the total length over which we compute the Poisson distribution

  • $P(K=k; \Delta t)$: the probability of exactly $k$ events happening over the interval of length $\Delta t$.

  • $E(K; \Delta t)$: the expected number of events happening over the interval $\Delta t$.

Proof

Let's split the interval $T$ into $n$ smaller intervals $\Delta t_n = T/n$. Let's denote $p_n = P(k > 0, T/n)$ the probability of at least one event happening within the smaller interval.

The expected number of events happening over the interval $\Delta t_n$ is at least $p_n$, as can be seen:

$$E(K; \Delta t_n) = \sum_{k=1}^{\infty} k \cdot P(K=k; \Delta t_n) \geq \sum_{k=1}^{\infty} P(K=k; \Delta t_n) = p_n $$

Expected value of $K$ over the full time interval $T$ is the sum of expected values over all the smaller $\Delta t$ intervals. Combined with the above inequality, we get the following upper bound for $p_n$:

$$p_n \leq E(K, T) / n$$

Now, probability of no event happening over the whole time period $T$ is the probability of no event happening in any of the $n$ small $\Delta t_n$ intervals. As these intervals are non-overlapping, we can assume the independence:

$$P(k = 0, T) = (1-p_n)^n \in (0, 1]$$

Note that the $P(k = 0, T) = 0$ would mean $p_n = 1$, which would lead to $E(K, T) \geq n \cdot 1$ for all $n \in \mathbb{N}$, which contradicts our assumption of the finite expected value.

Given that $P(K = 0, T) > 0$, we can find such a $\mu \in \mathbb{R^+}$ that $P(K=0, T) = \exp(-\mu T)$. Then:

$$p_n = 1 - e^{-\mu \Delta t_n}$$

And:

$$\lim_{n\to\infty} \frac{p_n}{\Delta t_n} = \mu$$

Obviously $P(K > 0; \delta t)$ is non-decreasing in respect to $\delta t$. The event has simply more opportunity to happen during the longer interval. This means that for any $\delta t$ we can bound $P(K > 0; \delta t) / \delta t$ with:

$$\frac{p_n}{\Delta t_{n+1}} \geq \frac{p_n}{\delta t} \geq \frac{P(K > 0; \delta t)}{\delta t} \geq \frac{p_{n+1}}{\delta t} \geq \frac{p_{n+1}}{\Delta t_n}$$

But $\Delta t_n / \Delta t_{n+1} \to 1$ as $n \to \infty$, so both bounds tend to the same limit and thus:

$$\lim_{\delta t \to 0} \frac{P(K > 0; \delta t)}{\delta t} = \mu$$

In other words:

$$P(K > 0; \delta t) = \mu \delta t + o(\delta t)$$

Corollary of the above that the probability of no event happening over any interval $\tau$ is precisely $P(K=0, \tau) = \exp(-\mu \tau)$, whether $\tau$ is big or small.

Now, we would like to show that:

$$P(k > 1, \delta t) = o(\delta t)$$

Let's assume that wouldn't be true and there would be a constant $\nu > 0$ such that:

$$\forall \epsilon > 0: \exists \delta t < \epsilon: P(K > 1, \delta t) > \nu \delta t$$

Let's split some interval $\tau$ to $n+1$ intervals as $n \times \delta t + 1 \times r$, where $r < \delta t$. Probability of two or more events happening is:

  1. Probability $P_\textrm{same}$ of two or more events happening within one of the $n$ intervals and no event happening elsewhere:

$$P_\textrm{same} \geq n (\nu \delta t + o(\delta t)) e^{-\mu \tau} + o(\delta t) = \nu \tau + o(\tau)$$

  1. Probability $P_\textrm{distinct}$ of one or more events happening in at least two distinct intervals:

$$P_\textrm{distinct} \leq \frac{n (n+1)}{2} (\mu \delta t + o(\delta t))^2 = \tfrac{1}{2}(\mu \tau)^2 + o(\tau)$$

Given that: $$P(K > 1, \tau) = P_\textrm{same} + P_\textrm{distinct}$$

We can always chose such a small $\tau$ that $P_\textrm{same}$ conditional on $P(K > 1, \tau)$ will be higher than any $p \in (0, 1)$ we desire (assuming $\nu > 0$).

But $P_\textrm{same}$ assumes that two events happened within a single interval of length $\delta t$, and we can pick $\delta t$ as small as we want, while keeping the $\tau$ same! If we denote $X$ as the distance between the two events happening, then there is a non-zero probability that $X = 0$. And that is in contradiction with the assumption that no two events can happen at the same time.

Therefore, $\nu = 0$, and

$$P(K > 1, \delta t) = o(\delta t)$$

We already showed that:

$$P(K > 0, \delta t) = \mu \delta t + o(\delta t)$$

Combined with the result we just proved, we get that:

$$P(K = 1, \delta t) = \mu \delta t + o(\delta t)$$

Thus we derived the "additional assumptions" from the question.