Expected proportion of distinct observations when sampling $N$ observations with replacement

117 Views Asked by At

In re-sampling problems, for instance bootstrapping, one has to sample $N$ observations with replacement, from an original set containing $N$ observations. As a result, some of the observations in the new sample will be duplicated. I vaguely remember that the expected proportion of distinct observations in the new sample is $1-e^{-1}$ when $N$ is large. Is this the correct answer?

Example: The original set is (1, 2, 3, 4, 5). The new sample is (4, 1, 2, 1, 4). In this case, the proportion of distinct observations in the new sample is 3/5.

1

There are 1 best solutions below

0
On BEST ANSWER

It is. Probability of observation not been included is $(1 - \frac{1}{N})^N$ that goes to $\frac{1}{e}$ as $N \to \infty$, so new sample will include approximately $N \cdot (1 - \frac{1}{e})$ observations.