Statistics for the number of empty vessels for large systems

49 Views Asked by At

Consider randomly (iid) dropping $b$ balls into $n$ vessels. I am interested in the statistics of $e$, the number of empty vessels. I've seen elsewhere on stackexchange that this can be written down in terms of Stirling numbers of the second kind, but I don't have a good sense of how to extract useful information from those expressions. I'm most interested in the following information:

  1. Can we find the probability distribution for $e$ in the large-$n$ limit, at least in some regimes for $k$? Can we say that the empty fraction $e/n$ is a function of the average number of balls per vessel $b/n$ in this limit?

  2. Can we find the mean and variance of $e$, again in the large-system limit?

I've tried a few things already. I started off by simply calculating the odds that no balls fell within $e$ specific vessels and then the odds that none fell within $e+1$ specific vessels. But of course it's wrong to say that the difference between these two numbers is proportional to the probability of exactly $e$ empty vessels.

Next, I tried to write down a recursion relationship, in which $P(e,b+1)$, the probability of $e$ empties given $b+1$ balls is (Edited to fix error)

$$P(e,b+1) = P(e,b)(1-e/n) + P(e+1,b)((e+1)/n).$$

That one still seems to me to be correct, and I wanted to rewrite it in terms of $b/n$ and $e/n$ and then do a Taylor expansion. That gives me a PDE, but when I try to solve it I get nonsensical answers.

1

There are 1 best solutions below

4
On BEST ANSWER

I am unsure if a simple expression exists for the full probability distribution but the mean is approximately $$ n e^{-b/n}:=ne^{-\lambda} $$ and the variance is approximately $$ n(e^{-\lambda}-(1+\lambda)e^{-2\lambda} ) $$ for $b/n:=\lambda$ small enough. This is the Poisson heuristic.

Edit: The basic idea is the following, you can adapt to your choice of $b.$

Let $Z$ be the number of empty vessels. We can write $Z = \sum_{i=1}^n E_i$ where $E_i$ is $1$ if vessel number $i$ is empty, $0$ otherwise.

$\mathbb{E}(E_i)$ is the probability that vessel number $i$ is empty, namely $(1 - 1/n)^b$, so $E(Z) = n (1 - 1/n)^b$.

For two distinct vessels $i$ and $j$, $\mathbb{E}(E_i E_j)$ is the probability that both are empty, namely $(1 - 2/n)^b$. There are $b(b-1)$ such ordered pairs. On the other hand $\mathbb{E}(E_j^2) = \mathbb{E}(E_j) = (1 - 1/n)^b$, and there are $n$ of these. So $\mathbb{E}(Z^2) = n(n-1) (1 - 2/n)^b + n (1 - 1/n)^b$, and $$\mathbb{Var}(Z) = \mathbb{E}(Z^2) - \mathbb{E}(Z)^2 = n(n-1)(1 - 2/n)^b + n (1 - 1/n)^b - n^2 (1 - 1/n)^{2b}.$$

References (you can find the Raab paper online):

  1. Raab, Martin; Steger, Angelika (1998). Luby, Michael; Rolim, José D. P.; Serna, Maria (eds.). ""Balls into Bins" — A Simple and Tight Analysis". Randomization and Approximation Techniques in Computer Science. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 159–170.

  2. Kotz, Samuel; Johnson, Norman Lloyd (1977). Urn models and their applications. New York, NY: John Wiley & Sons.