Randomly generate a sorted set with uniform distribution

1.3k Views Asked by At

I have an ordered set $S = \langle S_1, S_2, .., S_M \rangle$ from which I want to draw a sample of $N$ elements in such a way that the sample is non-strictly totally ordered (as with $\leq$ and the integers), and all the possible occur with equal probability. The sample must be taken with repetition.

For example, let's say $S = \langle 1, 2, 3, 4 \rangle$ and $N=3$, the samples: $[1, 1, 1]$, $[1,2,3]$, $[2,3,3]$ would be valid, but $[3,2,1]$ or $[2,1,1]$ would be invalid.

A simple way to generate this set would be to just randomly sample from $S$, and then sort the resulting sequence. However, please note that the following approach is biased ($[1,1,1]$ is less likely to occur than $[1,2,3]$, for example).

This question is related to one of the answers given in this StackOverflow question:https://stackoverflow.com/questions/26467434/generating-random-number-in-sorted-order. Note that the algorithm proposed there is to generate such a sample without repetition, whereas I want my sample to be generated with repetition.

2

There are 2 best solutions below

4
On BEST ANSWER

It's enough to pick $N$ random elements from $\{ 1, 2, \ldots, M+N-1 \}$ without replacement and then do a postprocessing step. Say you pick $T_1 < T_2 < \ldots < T_N$; then let $S_K = T_K - K + 1$. For example with $M = 4, N = 3$, this is like picking $3$ random elements from $\{1 ,2 , \ldots, 6\}$. So for example you might pick $T = \langle 1, 4, 5 \rangle$ and then $S = \langle 1 - 1 + 1, 4 - 2 + 1, 5 - 3 + 1 \rangle = \langle 1, 3, 3 \rangle$.

So you need an algorithm for picking random subsets of a given size.

To pick a random subset of size $k$ of the set $\{1, 2, \ldots, n\}$, there's a nice recursive algorithm. Such a set includes $n$ with probability $k/n$. If it includes $n$, then take the set to be a subset of $\{1, 2, \ldots, n-1\}$ of size $k-1$, with $n$ adjoined; if it does not include $n$, then the remainder is a subset of $\{1, 2, \ldots, n-1 \}$ of size $k$. (I learned this algorithm from the late Herb Wilf's notes "East Side, West Side", available online at https://www.math.upenn.edu/~wilf/eastwest.pdf - see page 16 for code. It's in Maple, but should be reasonably understandable.)

3
On

I would just draw $3$ random numbers and only accept ones in ascending order, that way you have equal probabilities (i.e. rejection method as discussed)

Otherwise,

Sample the triples by using monte carlo. I.e. count how many different combinations are permissible, and accept each with equal probability according to different values of a $U[0,1]$, i.e. give $111$ the range $[0,1/n]$, $211$ is $[1/n, 2/n]$. As long as you have some sensible ordering on the set of possible sequences you are in business. You can use for loops to achieve this.