Closed Form for Number of Samples Required when Picking Without Replacement

41 Views Asked by At

So maybe i missed something simple, but I could not find material related to the following problem:

Assume we have an urn with $N$ balls of which $k$ are colored black. Assume that $k$ is some fixed number $k \ll N$. I am interested in the number of balls required to draw (without replacement) to have some (fixed) probability $p$ (e.g. $p=\frac{1}{2}$) of drawing at least one black ball. If we let $f_k$ be a function describing the number of samples needed to achieve probability $p$, it seems that roughly $$f_k(N) \approx g(k)\cdot N.$$ For example, for $k=5$, calculating the probability yields the following for $N=100,1000,10000,100000:$ $$f_5(100) = 13, f_5(1000) = 130, f_5(10000) = 1295, f_5(100000) = 12945.$$ So in this case it seems that $g(5) \approx 0.12945$, i.e. we need to draw approximately $13\%$ of the balls to achieve a probability of at least $\frac{1}{2}$.

I am now mainly interested whether, for fixed $p$, there is a closed form (or strong approximation) for this function $g$, i.e. given fixed $k$ and $p$, i want to know what fraction of balls i have to draw to achieve probability at least $p$ of drawing a black ball.

Now from the probabilistic viewpoint, it seems that this problem is somewhat related to hypergeometric distributions. For fixed number of samples $n$, the probability of drawing at least one black ball is given by $$\mathbb{P}["\text{Atleast one black ball in $n$ draws}"] = 1 - \prod_{j=0}^{n-1} \frac{N-k-j}{N-j} = 1 - \frac{{N-k \choose n}}{{N \choose n}},$$ but it is unclear to me how to go from here to finding the number of draws required to achieve some set probability.

Thanks for your help!

1

There are 1 best solutions below

0
On BEST ANSWER

If we apply heavy approximation we might get

$1-p=\frac{{N-k \choose n}}{{N \choose n}} = \frac{(N-k)!(N-n)!}{(N-k-n)!N!} \approx {\frac{(N-k)^{N-k+1/2}(N-n)^{N-n+1/2}}{(N-k-n)^{N-k-n+1/2}N^{N+1/2}}} \approx {\frac{(N-n)^{k}}{N^{k}}} =\left(1-\frac nN\right)^k$

with the first approximation using Stirling's approximation and the second assuming that $k \ll N-n$ and cancelling similarly sized terms

That might suggest that $$n \approx N\left(1-(1-p)^{1/k}\right)$$

If $p=\frac12$ and $k=5$, as in your examples, then this suggests $n \approx 0.1294494\, N$ which is essentially what you found with rounding up. In other cases the approximation may not be quite as good.