Occupancy distribution bounds for $k$ balls in $m$ bins

439 Views Asked by At

Suppose we throw $k$ (distinct) balls into $m$ (distinct) bins, and let $B$ count the number of non-empty bins. I am interested in lower bounds on $B$. More precisely, I wish to bound from above the probability that $B$ is very low.

Assuming $k$ is smaller than $m$, or even bounded from above by $\beta m$ for some small $\beta$, then $B$ stochastically dominates $\text{Bin}(k,1-\beta)$, so I can use Chernoff-type bounds to bound from above the probability that $B$ is lower than some small constant times $k$. However, that "domination" seems rather rough, and for my calculations it seems like a lousy approximation.

I was wondering whether there are some better bounds.

Update

A different direction of thought is as follows: let $B_i$ be the indicator of bin number $i$ (that is, $B_i$ is 1 if the $i$'th bin is full, and 0 otherwise). So: $B=\sum B_i$. Since $$\mathbb{E}(B_i)=\mathbb{P}(B_i=1)=1-\left(1-\frac{1}{m}\right)^k\approx 1-e^{-k/m},$$ we have that $\mathbb{E}(B)\approx m(1-e^{-k/m})$. Now, if $k\ll m$, this is approximately $m\cdot k/m=k$, and otherwise this is at least some constant times $k$.

Now, showing concentration of $B$ should finish the proof, but I cannot find the proper concentration result to use in this case.

Update (2)

Let $b_j$ be the destination bin of the $j$'th ball. Let $X_j=\mathbb{E}(B\mid b_1,\ldots,b_j)$. To prove concentration, I thought that I might show that $X_j$ is a martingale with bounded differences. Indeed, it is a martingale by construction (a Doob martingale), and $\left|X_j-X_{j-1}\right|\le 1$ (I think), hence by Azuma's inequality, since $X_0=c=\Theta(k)$, $$\mathbb{P}(|X_j-c|>\lambda\sqrt{j})<2\exp(-\lambda^2/2)$$ so since $B=X_k$, $$\mathbb{P}(|B-c|>\lambda\sqrt{k})<2\exp(-\lambda^2/2)$$ and maybe this is as good as it gets?