Probability of repeated sampling from random draws with replacement

1.7k Views Asked by At

I am sampling 552025 patients from a population of 647117 patients. If this sampling is done with replacement, please can someone help tell me :

1) what is the probability that there is any occurrence of repeated sampling of patients? 2) the number of repeated patients from the 552025 sampled patients?

I do sincerely apologise if this question has been covered elsewhere. There are several questions that have a similar title to mine, but, entering the posts, they do not seem to cover the same topic in the body text per se.

I would be grateful for the mathematical solutions and/or formulae to approach the problem generally.

Thank you.

Kareem

1

There are 1 best solutions below

0
On BEST ANSWER

Suppose, more generally, that you have $N$ people in total and wish to sample $m$ with replacement? What is the expected number of distinct people sampled more than once?

Let $X_i$ denote the indicator variable for the $i^{th}$ person. Thus $X_i=1$ if that person is sampled more than once, and $X_i=0$ otherwise. By Linearity of Expectation, the answer we want, $E=E[N,m]$, is given by the sum $E=\sum_{i=1}^NE[X_i]$. Of course $E[X_i]$ is just the probability that the $i^{th}$ person is sampled at least twice. To compute that, we compute the probability that this person is never sampled or sampled exactly once.

Never sampled: $\left( \frac {N-1}N \right)^m$.

Sampled exactly once: $m\times \frac 1N \times \left( \frac {N-1}N \right)^{m-1}$

The probability that the person is sampled at least twice is, of course, just $1$ less the sum of those two values.

Thus $$E[N,m]=m\times \left(1-\left( \frac {N-1}N \right)^m-m\times \frac 1N \times \left( \frac {N-1}N \right)^{m-1} \right)$$

Combining all this and using the values you specify, we get that $$E[552025,647117]=0.210391739\times 552025=116141.4998$$

Of course the probability that there are no duplicates is effectively $0$. If you want to compute it exactly, note that it is given in general by $$\prod_{i=0}^{m-1} \frac {N-i}N$$ To see that this is, effectively, $0$ in your case note that from $i=323559$ to the end each term is less than $\frac 12$ so your expression is less than $\left( \frac 12 \right)^{228466}$ which is about $7.6\times 10^{-68776}$