Distribution of the maximum number of collisions

273 Views Asked by At

Given $n$ bins and $m$ balls, throw each ball into a bin that is chosen uniformly at random. Each throw is independent.

What is the distribution of the maximum number of collisions (i.e. maximum number of balls in one bin)?

Let $X_{ij}$ be an indicator random variable that denotes whether ball $i$ is into bin $j$; we have: $$ \mathbb{E}[X_{ij}] = \Pr(X_{ij} = 1) = \frac1n $$

Let $Y_j$ count the number of balls in bin $j$ after $m$ throws; we have: $$ Y_j \sim \mathsf{Binomial}\left( m, \ \frac1n \right) $$ $$ \mathbb{E}[Y_j] = \mathbb{E}\left[\sum_{i=1}^{m}X_{ij}\right] = \sum_{i=1}^{m}\mathbb{E}[X_{ij}] = \frac{m}{n} $$

Let $Z$ be the maximum number of balls in one bin after $m$ throws, that is: $$ Z = \max_{1\leq j \leq n} Y_j = \max_{1\leq j \leq n} \sum_{i=1}^{m}X_{ij} $$ $$ \frac{m}{n} \leq Z \leq m $$

I am interested in finding the distribution of $Z$, particularly for the case when $n = m$.


This is the maximum load for the random allocation problem.

Wikipedia gives a tight bound on $\mathbb{E}[Z]$ when $n = m$ as: $$ \mathbb{E}[Z] = \Gamma^{-1}(n) - \frac32 + o(1) $$


However, I want to find the actual distribution, if possible.

One possible approach I had in mind is that given the above definitions for the random variables, I have to find the distribution of $\left( Z \ \big| \ S = n \right)$ is, where: $$ S = \left ( \sum_{j=1}^{n} Y_j \right) \sim \mathsf{Binomial}\left(n^2, \frac1n\right) $$

And since for $n=m$ we have that $1 \leq Z \leq n$, then I assume I can compute: $$ \Pr(Z=k \ | \ S=n), \ k \in \overline{1,\dots,n} $$

Is this a good direction?

1

There are 1 best solutions below

0
On BEST ANSWER

You're asking for the distribution of the maximum order statistic of multinomial random variables with equal probabilities. Googling "multinomial order statistics" gives lots of relevant information.

There doesn't appear to be a closed-form probability mass function, see: Computing the exact distributions of some functions of the ordered multinomial counts: maximum, minimum, range and sums of order statistics, by Marco Bonetti, Pasquale Cirillo and Anton Ogay (October 2019, The Royal Society).

"In testing the equiprobability hypothesis, all the statistics above rely on approximations (like the Normal, the $\chi^{2}$, the Beta, the Dirichlet or the Gumbel), being their exact distributions not known."

*** Their paper assumes equiprobability and discusses algorithms for computing the distribution of the maximum (equation 4.1) as well as approximations. This appears to be the best anyone in the world knows how to do. Setting $n=m$ probably doesn't seem to be any particular special case where things simplify.***

(Their main contribution is: "we present novel general algorithms for computing the exact distributions of the multinomial minimum, of the range and of the sum of the $J$ largest order statistics.")