Sample tuples without replacement where each tuple contains unique numbers

119 Views Asked by At

Problem. Given the set $\mathbf S = [1, \ldots, n] \subset \mathbb N$, generate $m$ random $k$-tuples, where

  1. the numbers constituting each tuple are unique (i.e., drawn without replacement),
  2. within each tuple, the number $i \in \mathbf S$ is drawn with a given probability $p_i$.
  3. the ordering in each tuple doesn't matter (e.g., $(2, 3, 1)$ is considered the same as $(1, 2, 3))$ and
  4. the tuples are unique (i.e., drawn without replacement).

Example 1. For $\mathbf S = [1, 2, 3, 4, 5]$, $k = 3$ and $m = 4$, this would be a valid output:

$(3, 4, 1),\; (1, 2, 5),\; (5, 1, 3),\; (1, 2, 3)$,

given that $1$ was drawn with probability $p_1$, 2 was drawn with probability $p_2$ and so on.

Example 2. Given the assumptions in Example 1, this would be an invalid output:

$(3, 4, 1),\; (3, 4, 1),\; (5, 1, 3),\; (1, 2, 3)$,

since the first two tuples are the same.

Note. Hints in implementing that in C++ are also welcome.

1

There are 1 best solutions below

1
On

In R, suppose you are generating 'pairs' $(k = 2)$ so that elements are chosen from among 1 through 6 $(m=6).$ Elements must be different, and numbers $1,2,3$ are $1/4$ as likely as numbers $4,5,6$ (among possibilities remaining). Then R code is sample(1:6, 2, p=c(1,1,1,4,4,4)). [At each choice, R normalizes the proportionality vector for remaining possible choices.]

set.seed(903)
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 5 2
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 2 5
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 4 6
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 5 4
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 6 4
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 3 5
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 4 6
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 4 6
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 5 4
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 6 3
sample(1:6, 2, p=c(1,1,1,4,4,4))
[1] 6 3

Depending on your answer to @Math1000's comment, an alternative method would be to sample with replacement (using argument repl=T), and to reject results where numbers chosen are not distinct, provided $m \ge k.$