Can I uniformly sample from $N$ distinct elements, where $N$ is unknown but finite?

77 Views Asked by At

I have access to a list of $N$ elements, but the value of $N$ is unknown. The elements arrive one by one, and never repeat. I want to sample $n$ of these elements as uniformly as possible, as I have storage only enough to save $n$ elements, but not all of the (unknown) $N$. Is there any way to solve this problem? Thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

Yes you can if you can use another word of memory. Start by keeping the first $n$ objects. When object $n+1$ comes along, throw a random number and keep it object $n+1$ with probability $\frac n{n+1}$ If you keep it, throw another random number between $1$ and $n$ and replace that item. You should be able to convince yourself that the sample is uniform-that each item is kept with probability $\frac n{n+1}$. Now proceed by induction. Say you have a random sample of $n$ elements out of $n+k$ (so each is present with probability $\frac n{n+k}$) when element $n+k+1$ comes along. Keep it with probability $\frac n{n+k+1}$ and if you keep it, replace a random element of your pool. You just need to store how many elements you have seen so far.