Help with boxes and balls problem

170 Views Asked by At

Please help me to find error in the following solution for the problem: "We randomly throw $n$ balls into 1000 boxes and note the number of empty boxes. We repeat this experiment large number of times and observe that the average number of boxes with zero balls in experiment is 20. What is $n$?"

I thought that we could use indicator random variables: $X_1=Bern((1-\frac{1}{1000})^n)$, where this $X_1$ indicates that box #1 remains empty after $n$ balls thrown. Then we could say that we have $X_2$, and so on for the other boxes, and they all are $Bern((1-\frac{1}{1000})^n)$ indicator random variables, but they are not independent.

Now, we say that random variable $Y=X_1+...+X_{1000}$. This $Y$ is actually number of empty boxes that we note for each experiment. Then we have that expected value $E[Y]=20$. And since expectation of sum is sum of expected values (even if random variables are not independent) we have $20=n(1-\frac{1}{1000})^n$. This gives me incorrect answer.

I need to understand what I'm missing here. Thank you very much.

2

There are 2 best solutions below

0
On

You have mixed up your formulation,

imagine $n$ balls thrown randomly and independently into 1000 boxes.

In your last step, the formulation for the expected number of empty boxes should be

E[Y] $= 1000{(1- \frac1{1000})}^n =20$

which yields $n= 3910$

0
On

This is actually a version of the coupon collector problem.

Let $X_i$ be the number of ball throw needed from having $i$ filled boxes to $i+1$ filled boxes. Clearly $X_0 = 1$. More generally $$X_i \sim Geometric(p_{sucess} = \frac{1000-i}{1000})$$ since it's getting harder and harder to fill more new boxes. And it's quite clear that this is a geometric process with probability of success being simply the proportion of empty box over the total number of box. Also the $X_i$ are independent since knowing the number of ball to transition from $x$ filled boxes to $x+1$ filled boxes doesn't change anything to the probability to transition from $y$ to $y+1$ filled boxes.

Let $Y$ be the number of ball throw to fill 980 boxes. $$ Y = \sum_{i=0}^{979} X_i,\hspace{1cm} n = \mathbb{E}[Y] = \sum_{i=0}^{979} \mathbb{E}[X_i] = \sum_{i=0}^{979} \frac{1}{\frac{1000-i}{1000}} = \sum_{i=0}^{979} \frac{1000}{1000-i} = 1000 (H_{1000} - H_{20}) $$ Where $H_n$ is the $n$-th harmonic number. The approximation using computer gives $n \approx 3887.7312$. However you can get a simpler estimate using the fact that $H_n \approx \ln(n)$ which tells you $n \approx 3912.02$.