What's wrong in my use of Poissonization in this simple example?

114 Views Asked by At

We place $k=2$ balls inside $n$ bins, randomly, with uniform probability.

The probability of collission, (two balls go to the same bin) is of course $p=1/n$

If we instead use the Poisson approximation, we should take $n$ iid Poisson vars $Y_i$ with $\lambda = 2/n$

And the probability of no collisions would be

$$P(\cap Y_i \le 1) = e^{-n \lambda} ( 1+ \lambda)^n= e^{-2} (1+2/n)^n = 1 - \frac{2}{n} + o(1/n) $$

This gives an asympotic probability of collission $p\approx \frac{2}{n}$ ... which is twice the true one.

If, alternatively, I try to exploit symmetry, so that $p = n p_1$ where $p_1$ is the probability that a collision occurs in the first bin, the Poisson approximation leads me to the same (wrong) asymptotic:

$$p \approx n e^{-\lambda} \frac{\lambda^2}{2}= \frac{2}{n} e^{-2/n}=\frac{2}{n}+o(1/n) $$

Have I done something wrong?

2

There are 2 best solutions below

0
On BEST ANSWER

The only error is assuming that Poissonization gives asympotically right results when $n \to \infty$. Let's recap how Poissonization works.

Let fix $n$, and let $X^{(k)}$ be the uniform $k-$multinomial random variable: if we throw $k$ balls into $n$ bins with uniform probability, $X_i^{(k)}$ counts the number of balls into bin $i$. Of course, $\sum_i X_i^{(k)} = k$.

Let $Y^{(k)}$ be the Poisson iid $n-$dimensional variable, with $\lambda=k/n$, hence $E[\sum_i Y_i^{(k)}] = n \lambda = k$.

Poissonization consists, basically, in assumming $$P(X^{(k)}) \approx P(Y^{(k)}) \tag 1$$

What we actually know is that the multinomial coincides with the Poisson (regardless of the Poisson mean!) if conditioned on the sum being equal to $k$:

$$ P(X^{(k)}) = P(Y^{(j)} \mid A^{(j)}_k) \tag 2$$

where $A^{(j)}_k$ is the event that $\sum_i Y_i^{(j)} = k$.

Now

$$P(Y^{(k)}) = \sum_j P(Y^{(k)} \mid A^{(k)}_j) \, P(A^{(k)}_j) = \sum_j P(X^{(j)})\, P(A^{(k)}_j) \tag 3$$

When (only when) $P(A^{(k)}_j) $ gets very concentrated around $j=k$, and $P(X^{(j)})$ does not change much around that concentration zone, we can conclude $(1)$.

This is often true, when $n$ is large, because of the law of the large numbers. And because often $k$ grows with $n$.

The above can be also writen in terms of events that depend on $X^{(k)}$ or $Y^{(k)}$.

In our example, let $\mathcal X^{(k)}$ be the event that there is a collision in the $k-$multinomial setup; and let $\mathcal Y^{(2)}$ be the corresponding event for the $\lambda=2/n$ Poisson. We have

$$ P(\mathcal Y^{(2)}) = \sum_j P(\mathcal X^{(j)}) P(A^{(2)}_j) $$

Here $P(A^{(2)}_j)$ is the probability that $n$ Poisson variables with mean $\lambda=2/n$ sum up to $j$. This concentrates around the mean $j=2$... but no too much, actually $P(A^{(2)}_2)=P(A^{(2)}_1)$ (and it doesn't change with $n$). And, sadly, $P(X^{(j)})$ changes quite abruptly in that range: $P(X^{(2)}) = 1/n$ (the exact value) but $P(X^{(1)}) = 0$ (no collision with one ball). Hence the approximation does not improve with $n$.

enter image description here

0
On

The Poisonnization is making two simplifications here.

  1. The marginal distribution for the number of balls inside a specific bin is binomial distributed. This is approximated with a Poisson distribution.

  2. The joint distribution for the number of balls in all the bins is approximated as the bins being distributed independently.

If we do not make the approximation 1 then the limit will correspond to $1/n$.

If we use the binomial distribution then the marginal probability is

$$P(Y_i \leq 1) = 1-1/n^2$$

and the product for $n$ independent cases is

$$1-\prod_{i=1}^n P(Y_i \leq 1) = 1-(1-1/n^2)^n \approx \frac{1}{n}-\frac{1}{2n^2}+ O\left(\frac{1}{n^3}\right)$$

where the series expansion is made for $n = \infty$


So the approximation fails because the error of approximation 1 is blown up when we increase $n$. The approximation 1 improves when $n$ grows, but the small remaining error gets multiplied because we apply a power $n$ to it.