Number of items missed while taking n random items m times out of n*m total items?

38 Views Asked by At

I have a list 'A' of 100000 total items, no duplicates.

I then pick 50 items at a time 2000 times, putting them back every time (cloning them).

These 50*2000=100000 items I just picked(cloned) I use to try to build a new list 'B'.

How many items form 'A' will be missing in 'B'?

How many of the items in 'B' will be duplicates?

1

There are 1 best solutions below

0
On BEST ANSWER

The probability of a fixed item to be found exactly $k$ times in $B$ is: $$p_k=\binom{2000}k\left(\frac{50}{100000}\right)^{k}\left(\frac{100000-50}{100000}\right)^{2000-k}$$

Then with linearity of expectation we find that $100000p_k$ items are expected to be found exactly $k$ times in list $B$.

So $100000p_0$ are expected to be missing and $100000(1-p_0-p_1)$ are expected to be found more than once.