Fast method to pick unique random numbers?

394 Views Asked by At

In general, computer simulation involving random numbers (lets say to simulate a fair deck of playing cards so $1$ thru $52$) runs fast if you only pick a few cards. However, as you pick more and more, the algorithm slows because of "collisions", meaning, the random number generator will very likely repeat a card and then we will have to try again to resolve it. As we get closer and closer to exhausting the deck (like we picked $45$ cards and still want more), the algorithm "crawls" (by comparison to when we pick only a few cards) since there are mostly collisions as the available cards dwindle (using the flag method of already seen cards).

So I am wondering what a good way is to alleviate this problem for simulation of games where we need a variable # of random numbers from about $25$% to $90$% of all possible. Perhaps we can just concentrate on $50$% for the sake of comparing speeds. For example, to pick $26$ of $52$ cards randomly using the flag method, we would need on average $1.38 * 26 = 36$ random numbers which is very reasonable and quite fast but can we do better as far as speed?

Also, would it be fair to say that if we choose a random number (let's say from $1$ to $52$ representing the cards in a deck), and we get a collision, that just picking the next higher not yet chosen card still maintains good randomness because each previously chosen card has an equal chance of a collision? For example, if we choose $2, 37, 51, 15, 44, 37$, we would then make the 2nd $37$ a $38$ because $38$ has not been chosen yet.

1

There are 1 best solutions below

0
On

Python's random.sample method solves this problem, and chooses between two different ways depending on how large the sample is compared to the full deck.

If the sample is relatively small, it will generate a bunch of random indexes into the array, and throw out duplicates (using the set type), until it has the right number of distinct indexes.

If the sample is large, it uses the Fisher-Yates Shuffle method. It repeatedly picks a random index and pulls that card into your sample, removing it from the deck at the same time. Technically, it removes the card and swaps the last card into the place it removed from, instead of just shifting the stuff after the gap; this reduces the amount of moving you need to do to shorten the list.

The first method works quickly for small samples, and doesn't use up much memory: it needs only enough room to store as many indexes as the size of the sample, and each card selection requires slightly more than one random number to perform. The number of required random numbers grows as you select more cards: to 1.5 at 1/3 of the deck already selected, to 2 at 1/2 the deck already selected, and it gets worse from there: see the Coupon Collector's Problem. So it's terrible for large samples.

The second method always requires exactly one random number generation per selection, and so has guaranteed time performance... but it also requires either a list of indexes the size of the original deck, or freedom to mangle the original deck (which Python's random.shuffle uses).

Your "unpick" method has an important restriction: the results of unpicking aren't shuffled.