Unranking pseudo-random values to produce uniform distribution over all permutations

106 Views Asked by At

Following this question (and answers) on SO.

The problem is to find a method to produce an unranking of combinatorial objects in random order, but in such a way that all possible orderings are unformly and unbiasedly sampled.

In my answer it was mentioned that a scheme similar to the generation of pseudo-random numbers will work, but now the new question is under what values of the modulo function will be able to sample uniformaly and unbiased all the orderings?

Specificaly it was mentioned that a scheme like:

def succ(index, numItems):
   return (index*seed + step) mod numItems

Will return such an ordering, provided that at least seed and step are relatively prime and large enough with respect to numItems (so the space of items is uniformly sampled without periodicities)

However for what values of seed and step (possibly as functions of numItems) will one be able to sample all possible orderings of the items in an unbiased and uniform way?? In other words for what values of seed and step can one say that one can generate all possible orderings (uniquely and unbiasedly) and not just one of them?

Note this is mostly a mathematical question than algorithmic one per se. We dont mind if the ordering function can be reversed engineered, only that we can generate succesor functions with modulo (for fast performance) which for specific values of seed and step we can generate any random ordering of the indices unbiasedly. Is this possible or are there orderings which cannot be given this way?

1

There are 1 best solutions below

22
On BEST ANSWER

There is a really nice algorithm of Myrvold and Rusky to rank/unrank permutations in linear time. You can download the (very short) paper here: http://webhome.cs.uvic.ca/~ruskey/Publications/RankPerm/RankPerm.html

Essentially, the algorithm is derived from the method of generating a random permutation. Start with the identity, then randomly swap the first element with one of the later elements. Next, randomly swap the second element with one of the later; then swap the third with one of the later, etc. "One of the later" could include swapping with itself (i.e. make no swap). Their algorithm gives a way of taking an integer (the permutation number) and making a specified sequence of swaps (based on the number) to get that random permutation. Likewise, from a given permutation, you can unravel the swaps to generate it's index. It is a very fast algorithm.

EDIT: The way I envision selecting your objects in random order is this. I assume you have $n$ objects and for each $1\leq i\leq n$ you can either look up object $i$ on a list (memory intensive) or regenerate the object. Then, you use a pseudo-random number generator to pick a random number $r$ between $1$ and $n!$. This is your permutation number which you unrank to get permutation number $r$, call it $\sigma$ (this will be viewed as a random ordering of your objects). Then you take your object in the order $\sigma(1), \sigma(2), \sigma(3),...,\sigma(n)$. Where, again, I assume you can look-up or generate object $\sigma(i)$ as needed.

EDIT, EDIT: In the case that you want to use the linear function $f(i)=i*seed+step \mod n$ (where $n$ is the number of items), if seed is relatively prime to $n$ (their gcd is 1) then you will not get get duplicate values out of $f$ until you've entered more than $n$ values for $i$. The reason is the following: suppose that $f(i_1)=f(i_2)$, then

\begin{align*} i_{1} seed + step &\equiv i_{2} seed + step \mod n \\ i_{1} seed &\equiv i_{2} seed \mod n \\ \end{align*}

Since $\text{gcd}(n, seed)=1$ it follows that $seed$ has an inverse modulo $n$, i.e. a value $seed^{-1}$ so that $seed^{-1}\cdot seed\equiv seed\cdot seed^{-1}\equiv 1 \mod n$. Multiplying the above by $seed^{-1}$ we have:

\begin{align*} i_{1}seed\cdot seed^{-1} \equiv i_{2}seed\cdot seed^{-1} \mod {n} \\ i_{1} \equiv i_{2} \mod {n}. \end{align*}

Thus, you only get repeats out of $f$ is the two inputs are congruent modulo $n$. To avoid this, just take the integers $0, 1, 2, ..., n-1$ as your inputs to $f$. Note that if $\text{gcd}(seed, n)\neq 1$ then $seed$ does not have an inverse and the argument fails.