Randomly place N balls in K buckets. Expected min number of buckets to select G balls?

242 Views Asked by At

Looking for anything resembling a closed form solution of the following monte-carlo simulation. The basic idea is:

  • Randomly place N balls in K buckets
  • Sort the buckets from greatest to least number of balls
  • Take buckets until you have taken at least G balls
  • How many buckets did you have to take?
from random import randint

n = 8000000 # balls
k = 80000   # buckets
g = 2000    # goal

# initialize buckets to 0
buckets = [0] * k  

# populate buckets
for _ in xrange(n):
  buckets[randint(0,k-1)] += 1

# pick buckets from largest to smallest until g is reached
picked = 0 
count = 0 
for b in reversed(sorted(buckets)):
  if picked >= g:
    break
  picked += b
  count += 1

print "Picked %d balls in %d buckets" % (picked, count)

For the preset n,k,g, the average number of balls per bucket is 100, but if we pick only the largest buckets, most of those have ~140 balls, so it's possible to get the goal 2000 balls in around 14 or 15 buckets rather than the 20 one might expect based on the averages.