Is there a relationship between the mean of a Poisson distribution and the first "impossible" value to come in such distribution?

63 Views Asked by At

In a Poisson distribution, the mean and the variance are the same and so, increasing the mean, the spread of the distribution increases. If I generate a Poisson distribution like this:

enter image description here

this is a Poisson distribution with mean 3 (I obtained it by using the Python code: numpy.random.poisson(3, 10000)). In this case, we can notice that it is really unlikely to obtain 15 as a number from this distribution. How can I know, for instance, this number 15 given the mean of a Poisson distribution ?

Thanks in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

The answer of course depends on the meaning of "really unlikely." Generally, suppose $N > 0$ and we want to estimate the first $k \ge \lambda$ such that $\mathbb{P}(X = k) = e^{-\lambda} \frac{\lambda^k}{k!} \le \frac{1}{N}$; for example if we take $N = 10000$ then we want to estimate the first $k \ge \lambda$ such that the probability that a Poisson random variable with rate $\lambda$ takes the value $k$ is less than $0.01\%$ (so it will occur about once in $10000$ trials). Then we want to solve, approximately,

$$e^{-\lambda} \frac{\lambda^k}{k!} = \frac{1}{N}$$

for $k$. To do this in a simple way we'll use the crude Stirling approximation $k! \approx \left( \frac{k}{e} \right)^k$, which gives $\frac{\lambda^k}{k!} \approx \left( \frac{e \lambda}{k} \right)^k$ and hence

$$e^{-\lambda} \left( \frac{e \lambda}{k} \right)^k \approx \frac{1}{N}.$$

At this point it will be useful to build some intuition by plugging some values in for $k$ to see how this function on the left behaves. Plugging in $k = \lambda$ gives $e^{-\lambda} e^{\lambda} = 1$; this is an approximation to the true value but correctly suggests that a Poisson variable takes on its mean with high probability. Plugging in $k = 2 \lambda$, on the other hand, gives $e^{-\lambda} \left( \frac{e}{2} \right)^{2 \lambda} = \left( \frac{e}{4} \right)^{\lambda}$ which exponentially decays. This suggests that it'll be useful to think in terms of multiples of the mean, so let's make the substitution $k = n \lambda$, so that we want to solve

$$e^{-n \lambda} \left( \frac{e}{n} \right)^{n \lambda} = n^{-n \lambda} \approx \frac{1}{N}.$$

Taking logarithms on both sides then gives

$$n \lambda \log n \approx \log N.$$

This function can be inverted with the Lambert W function, which gives

$$\log n \approx W \left( \frac{\log N}{\lambda} \right).$$

This gives $n \approx \exp \left( W \left( \frac{\log N}{\lambda} \right) \right)$ which gives

$$k = n \lambda \approx \lambda \exp \left( W \left( \frac{\log N}{\lambda} \right) \right).$$

This can be simplified using the identity $\exp(W(x)) = \frac{x}{W(x)}$ to get

$$\boxed{ k \approx \frac{ \log N}{W \left( \frac{\log N}{\lambda} \right)} }.$$

This is a funny function! It's not even obvious that it's larger than $\lambda$, which it would have to be. For $N$ fixed and $\lambda \to \infty$ the argument $\frac{\log N}{\lambda}$ becomes small, so to get a more reasonable-looking approximation out of this we can take the first two terms of the Taylor series of $\frac{x}{W(x)} \approx 1 + x$, which gives

$$\boxed{ k \approx \lambda + \log N }.$$

For example, with $\lambda = 3, N = 10000$ this gives that the first value of $k$ such that $\text{Pois}(3)$ takes the value $k$ with probabiity less than $0.01\%$ is approximately $3 + \log 10000 = 12.2 \dots$; substituting $k = 12$ gives $\mathbb{P}(\text{Pois}(3) = 12) = 0.0055 \dots \%$ so this is pretty close. Here's a WolframAlpha plot comparing $\frac{\log 10000}{W \left( \frac{\log 10000}{\lambda} \right)}$ and $\lambda + \log 10000$ for $\lambda \in [0, 20]$:

enter image description here

The approximation is not very accurate for small $\lambda$ but gets more accurate as $\lambda$ gets bigger.

In general this sort of calculation can be done using large deviations inequalities such as Cramer's theorem.