Probability that an element will not appear in a Sample of size N drawn from a Set of size N with replacement

166 Views Asked by At

The probability that an element of a Set A of size N will not appear in a Sample B of size N drawn with replacement from the aforementioned Set A is 1/3 if I am correct.

But how can we prove this?

I tried to think this over and I arrived to a different conclusion. I.e. I view the creation of the Sample B as a sequential process through which N slots of Sample B are filled with Replacement from Set A. Then for a given element E of the original Set A the probability of exclusion from the first slot of Sample B is (N - 1)/N. The same applies for all the N - 1 remaining slots. Therefore the probability that E will not appear in any of the slots is the probability that will fail to be included in N independent trials where the probability of non-inclusion is (N - 1)/N. Consequently, the probability of E to be left out once the process of filling the slots of Sample B completes is ( (N - 1)/N ) ^ N.

Your advise will be appreciated.

2

There are 2 best solutions below

0
On

It's unclear why you think it should be $1/3$, but in short: it is not.

Sampling with replacement independently $m$ times among $N$ disticnt objects gives that the probability of missing a specific object $a$ in every single trial is exactly $$ \left(\frac{N-1}{N}\right)\cdot \left(\frac{N-1}{N}\right)\cdot \dots\cdot \left(\frac{N-1}{N}\right) = \left(\frac{N-1}{N}\right)^m $$ as you found; which, for $m=N$, gives $$ \left(\frac{N-1}{N}\right)^N= \left(1-\frac{1}{N}\right)^m \xrightarrow[N\to\infty]{} e^{-1}\simeq 0.36 $$ using the well-known definition of the exponential as $e^x = \lim_{n\to\infty} (1+\frac{x}{n})^n$.

I assume this is where the $1/3$ comes from: as a (rough) approximation for $e^{-1}$.

2
On

Doing it the other way: I need the inclusion -exclusion principle, which results in an alternating sum... When the dust clears you end up with $1-\sum_{k=1}^N\frac1N +{N\choose k}(-\frac1 {N})^k $. Which equals $(\frac {N-1}N)^N $. The term $(\frac1N)^k $ appears because it is the probability that $k $ choices are "positive". .. the $N\choose k $ because you keep subtracting probabilities (and then adding them, to compensate) of intersections of $k $ of the $N$ events...