Say there are 100 balls that are randomly distributed across 64 buckets, with the same probability of ending up in any bucket and trials are independent. What is the probability that at least one of the 64 buckets will have more than 20 balls?
A similar problem to this is calculating the probability of a specific bucket having more than 20 balls, which can be solved using the survival function or CDF function of the binomial distribution:
$1-\sum_{i=0}^{20} \binom{100}{i}(1/64)^{i}(1-1/64)^{100-i}=7.33\times10^{-18}$
However, I'm not sure how to bridge the gap to determine the probability of any bucket in our population having more than 20 balls.
Thank you!
For $i ∈ \{1,…,64\}$ Let $E_i$ be the event that the $i$-th bucket contains more than $20$ balls. You want to compute: $P[\bigcup_{i = 1}^{64} E_i]$. By the union bound we have:
$P[\bigcup_{i = 1}^{64} E_i] ≤ \sum_{i = 1}^{64} P[E_i] = 64 · 7.33 · 10^{-18}.$
However, any outcome where several buckets overflow at the same time is counted several times. Consider the event $E_i ∩ E_j$ that buckets $i$ and $j$ (for $i ≠ j$) both contain more than $20$ balls. It is possible to get an exact formula, but for now let us just use that the events are negatively correlated and write
$P[E_i ∩ E_j] ≤ P[E_{i}] · P[E_{j}] ≈ (7.33 · 10^{-18})^2$.
Now we can estimate
\begin{align} P[\bigcup_{i = 1}^{64} E_i] &≥ \sum_{i = 1}^{64} P[E_i] - \sum_{i ≠ j} P[E_i ∩ E_j]\\ &≥ 64· 7.33 · 10^{-18} - \binom{64}{2} · (7.33 · 10^{-18})^2 ≈ (7.33 · 10^{-18})·(64 - 1.47 · 10^{-14}) \end{align}
To see that the first inequality holds, consider an outcome where exactly $k ≥ 1$ buckets overflow. The outcome is counted positively $k$ times in the first sum and negatively $\binom{k}{2}$ times in the second sum. Since $k - \binom{k}{2} ≤ 1$ for $k ≥ 1$ the outcome is counted at most once.
We now have upper and lower bounds that are very close together. Of course, this doesn't quite answer the question. However, it points in the right direction, namely, you can use the Inclusion-Exclusion principle to get more precise results. The next step would be to write down precise formulas for the probability that a set of $2,3$ or $4$ buckets overflow at the same time. Since there are not enough balls to make 5 or more buckets overflow, this would yield a (somewhat messy) but exact formula.