Distribution of N balls in N buckets in descending order?

96 Views Asked by At

Let's suppose I have $N$ buckets and $N$ balls, and I add a ball to a uniformly random bucket, N times.

(So, for example, it's possible each bucket contains one ball. It's also possible that one of the buckets has N balls, and the others have 0.)

After this, lets suppose I rearrange the buckets in a descending order of how many balls they hold, and label (one of) the buckets with the most balls 1, the next 2 and so on until N is (one of) the buckets with the least balls.

What is the expected value $E_N(i)$ of the number of balls that the ith bucket has?

For $N=1$ its trivial:

$$ E_1(1) = 1 $$

For $N=2$, either they will fall 1-1 or they will fall 2-0, both with equal probability, so:

$$ E_2(1) = (2 + 1) / 2 = 1.5\\ E_2(2) = (0 + 1) / 2 = 0.5 $$

Is there a general solution for $E_N(i)$ ? and what about as N approaches infinity?

Does the distribution/function have a name?

1

There are 1 best solutions below

2
On

Order statistics usually start with the smallest first so $X_{(1)}$ would be the minimum and $X_{(N)}$ the maximum of the observations.

For large $N$ you might expect about $N(1−\frac1N)^N\approx \frac N e \approx 0.37\, N$ of the boxes to be empty though this might vary slightly with a particular attempt. Similarly you might expect about $N(1−\frac1N)^{N-1}$ to have one item (a similar number) and about $\frac{N}{2}(1−\frac1N)^{N-1}$ to have two items (about half the previous numbers); only about $8\%$ would have more than two, though there will be slight variation on each particular attempt.

Here is a simulation with $N=100$ and $1000$ runs. It illustrates the average values found for each order position. You can see the many positions where the numbers was $0$ or $1$ or $2$ each time and the intermediate positions where it was sometimes one value and sometimes another. You would get something similar with larger $N$ or more simulations, though there might be more definition for those buckets with a large number of balls. This distribution does not have a name.

set.seed(2022)
N <- 100
cases <- 1000
numbers <- numeric(N)
for (i in 1:cases){
  results <- as.vector(sort(
               table(factor(sample(N,N,replace=TRUE), levels=1:N))))
  numbers <- numbers + results
  }
averages <- numbers/cases
plot(averages) 

enter image description here

Here is the same thing but with $N=10000$: you can now see the steps at $3$ balls and $4$ balls and the beginning of a step at $5$, while the earlier steps look more defined (though remember there are $100$ times as many points).

enter image description here