Looking for anything resembling a closed form solution of the following monte-carlo simulation. The basic idea is:
- Randomly place N balls in K buckets
- Sort the buckets from greatest to least number of balls
- Take buckets until you have taken at least G balls
- How many buckets did you have to take?
from random import randint
n = 8000000 # balls
k = 80000 # buckets
g = 2000 # goal
# initialize buckets to 0
buckets = [0] * k
# populate buckets
for _ in xrange(n):
buckets[randint(0,k-1)] += 1
# pick buckets from largest to smallest until g is reached
picked = 0
count = 0
for b in reversed(sorted(buckets)):
if picked >= g:
break
picked += b
count += 1
print "Picked %d balls in %d buckets" % (picked, count)
For the preset n,k,g, the average number of balls per bucket is 100, but if we pick only the largest buckets, most of those have ~140 balls, so it's possible to get the goal 2000 balls in around 14 or 15 buckets rather than the 20 one might expect based on the averages.