Can you select random entry from unknown number of entries?

637 Views Asked by At

Imagine you've got entries coming in, without knowing when they end (how many will follow from now on). You're supposed to pick random one and be fair. You can't save the entries that passed but you can remember how many passed.

At this point, if you imagine it, it seems impossible. At the point of choosing, you don't know what should be item's chance to be chosen.

But maybe there is some trick. And if not we'll have to prove that it's impossible - which I expect to be part of negative answer.

1

There are 1 best solutions below

4
On BEST ANSWER

There is a standard algorithm for this:

  1. Let $S$ (the selected entry) be the first entry.
  2. If there is a second entry, generate a random number $x$ between 0 and 1. If $x\lt \frac12 $ discard the previous selection and let $S$ be the second entry instead.
  3. If there is a third entry, generate a random number $x$ between 0 and 1. If $x\lt \frac13$ discard the previous selection and let $S$ be the third entry instead.

(Repeat similarly until the final entry is reached; the $i$th entry should be selected with probability $\frac1i$, replacing whatever was previously selected, and should otherwise be discarded. )

The selected entry is whatever remains in $S$ after this procedure completes.

Pseudocode:

i := 0
while (another entry remains) do:
  E := (read next entry)
  i := i + 1
  x := random (0,1)
  if x < 1/i do:
    S := E
  end;
end;
(the selected entry is now in S)

A simple induction proof establishes that if there are $n$ entries then each is selected with probability exactly $\frac1n $. For if that is true of the first $n-1$ entries, then they are each selected with probability $\frac1 {n-1}$ before the final step. Then the final entry is selected with probability $\frac1n $; if the final entry is not selected then the earlier entry already in $S$ has been selected with total probability $\frac1 {n-1}\frac {n-1}n$.

This is a well-known algorithm among computer programmers; a web search for "select random line without counting lines" will find many discussions of the problem and its solution. )