Let there be $m$ indistinguishable balls, $k$ bins, $C$ capacity. Let $X_j$ denote the total balls in bin $j$. I've seen ways to calculate the total number of combinations, but I'm not sure how to go about calculating the mean and variance of $X_j$. It is understood that the ball is always thrown into one of the empty bins with uniform probability. edit: nonfull bins not empty bins sorry.
2026-03-26 07:59:20.1774511960
On
Variance and mean of balls in bins limited capacity
595 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
3
On
Here's some a Monte Carlo simulation in Python for the problem. Disclaimer: it's not efficient.
import numpy as np
import sys
def simulate(num_balls, num_bins, capacity, num_sims):
assert(capacity * num_bins >= num_balls)
results = np.empty((num_sims,))
for i in xrange(num_sims):
unfilled_bins = set(range(num_bins))
bin_counts = np.zeros((num_bins,))
for _ in xrange(num_balls):
chosen_bin = np.random.choice(list(unfilled_bins))
bin_counts[chosen_bin] += 1
if bin_counts[chosen_bin] >= capacity:
unfilled_bins.remove(chosen_bin)
if chosen_bin == 0:
break # Stop the simulation prematurely.
results[i] = bin_counts[0]
return results
if __name__ == '__main__':
try:
num_balls, num_bins, capacity, num_sims = map(int, sys.argv[1:])
except:
sys.stderr.write(('usage: {} NUM_BALLS NUM_BINS CAPACITY '
'NUM_SIMS\n').format(sys.argv[0]))
sys.exit(1)
total_capacity = capacity * num_bins
if total_capacity < num_balls:
sys.stderr.write(('error: parameters must satisfy'
'CAPACITY * NUM_BINS >= NUM_BALLS`.\n'))
sys.exit(2)
results = simulate(num_balls, num_bins, capacity, num_sims)
mean = results.mean()
var = results.var()
sys.stdout.write('mean = {}\tvar = {}\n'.format(mean, var))
```
This seems computationally very difficult to me. You might do better with simulation.
It $C\ge m$ then the constraint has no effect, and we are just looking for the mean and variance of the number of balls in bin $j.$ This has a binomial distribution, and the answer is well known.
When ${m\over k}\leq C< M,$ the situation is much more complicated, because the result depends on the sequence in which the balls were thrown into the bins. Before any bin fills up, a ball has probability $1/k$ of landing in bin $j.$ After some other ball fills up, it has probability $1/(k-1)$ of landing in bin $j$. The situation is even more complicated if it's possible for more than one ball to fill up.
Consider the case $k=3, j=1.$ Let $P(x_1,x_2,x_3)$ be the probablity that at some point there are $x_i$ balls in bin $i$ for $i=1,2,3.$ To compute the expectation, we have to sum up terms of the form $aP(a,b,c)$ where $a+b+c=m.$ If $\max\{a,b,c\}<C,$ we have simply $$P(a,b,c)={m\choose a,b,c}3^{-n}\tag{1}$$ But suppose $c=C.$ Then $$P(a,b,c)=P(a,b,C)=\frac12P(a-1,b,C)+\frac12P(a,b-1,C)+\frac13P(a,b,C-1)\tag{2}$$ taking account of all bins the last ball might have been thrown into.
The last term on the right-hand side of $(2)$ can be computed directly from $(1),$ but the first two have to be computed recursively from $(2)$. It won't take many bins, or many balls, before this computation becomes unwieldy, if the capacity constraint is effective.
There are so many possible sequences of throws that memoization is unlikely to be effective, so far as I can see. I've been tying to think of ways to simplify the calculations, but so far, I don't have a glimmer.