Minimum N to get at least 10 different items in the second draw

124 Views Asked by At

I have to randomly choose 50 items out of N possible items. Is there any way to compute N so that, the second time I do this process there is a large probability ( let's say p=0.5) that at least 10 items out of 50 are different? I hope I was clear enough, I am not a mathematician. Thanks in advance.

EDIT: To explain better the purpose/context of my question. The items are different, they are questions in an online test. Each user gets 50 random questions to be answered. The user can take the test several times(I assumed two times). The purpose is to avoid the users getting the same questions the second time they take the tests, hence the 10 different items i was talking about. So I want to know how many different questions must be there to start with, so that, if some users take the test several times(2), at least some of the questions (10) are different?

3

There are 3 best solutions below

1
On BEST ANSWER

The average number of new items on the second draw is $50\left(1-{50\over N}\right)$. Setting this equal to $10$ will give you about a 50% chance to get 10 new items, that is, $N=62.5\approx 63$ should suffice.


Here is a chart for various $N$ values, calculated using the hypergeometric distribution.

$$\begin{array}{|c|c|}\hline N & \mathbb{P}(\text{number of new items }\geq 10) \\ \hline 59& 0.00000\\ \hline 60& 0.13625\\ \hline 61& 0.35961\\ \hline 62& 0.57756\\ \hline 63& 0.74362\\ \hline 64& 0.85324\\ \hline 65& 0.91936\\ \hline 66& 0.95692\\ \hline 67& 0.97742\\ \hline 68& 0.98832\\ \hline 69& 0.99400\\ \hline 70& 0.99693\\ \hline\end{array}$$

3
On

Assuming N unique items:

Modifying the answer at Question on overlapping sets gives us, that on average, from two draws we will have

$100-100\left(\frac{N-50}{N}\right)^2$

unique items.

Since we want at least 10 different items, we want at least 60 unique elements from the 100 total. Solving for N gives $\frac{50}{3}(5+\sqrt{10})\approx 136$. I'm not sure if that's exactly what you are looking for, though (since the distribution of number of unique elements is probably not symmetric about the mean).

2
On

We can model it as a Poisson process. In the first run, each item is expected to be picked $\frac {50}N$ times. This is the $\lambda$ in the Poisson distribution. The expected number not picked is $N\exp(-\lambda)=N \exp(-\frac {50}N)$. Then when we have picked $100$ times, the expected number of times an item is picked is $\frac {100}N$ and the number of unpicked items is $N \exp (-\frac {100}N)$ We want the difference of these to be $10$, giving $N(\exp(-\frac {50}N)-\exp (-\frac {100}N)=10$ If we define $x=\frac{50}N$, this becomes $5(\exp(-x)-\exp(-2x))=x$, which Alpha solves with $x\approx 1.10064$ This says we need $46$ items, of which we will on average pick $31$ different ones in the first go and another $10$ in the second.