How to uniformly sample multiple numbers whose product is within some bound

488 Views Asked by At

Suppose I have 3 positive integers: $n_1$, $n_2$, and $n_3$.

How do I uniformly sample $(n_1, n_2, n_3)$ so that $50 < n_1 n_2 n_3 < 100$.

I could sample each number independently with bounds between 1 and 100, then keep re-sampling if their product is out of bound.

I need to do this with a computer program with more numbers and larger bounds, so I prefer a way that's efficient both in space and time.

Are there other ways to sample than what I described above?

EDIT (regarding the accepted answer):

At the time of this edit, there were 3 approaches answered so far:

  • complete enumeration (addresses the example in the question, simple, but space and time bound for very large problems)
  • rejection sampling (simple, but time bound for very large problems)
  • weighted draw ("efficient" in space and time compared to other approaches, but complex)

All answers work best in different situations.

I accepted weighted drawing since it seemed to be most complete in a sense that it addressed the "efficient both in space and time" part for larger problems the best.

4

There are 4 best solutions below

2
On BEST ANSWER

Depending on how large your bounds are, what you consider 'efficient', and how much preprocessing you're willing to do, one option is to do a weighted draw of $n_1$. For each $n_1$ you can compute how many $(n_2,n_3)$ pairs yield $b_{min}\leq n_1n_2n_3\leq b_{max}$; this will take $O(b_{max}^2)$ time and $O(b_{max})$ space naively. Then build a CDF table of $n_1$ from those pairs; this will ensure the appropriate probability for that variable. Once you have $n_1$ you can either do the same procedure a dimension down (but note that the tables there will be $n_1$ dependent, so you'll have to build them after choosing $n_1$ or have them available for many possible $n_1$) or do rejection sampling, though presumably with a much smaller range.

For an example, suppose we have $5\leq n_1n_2n_3\leq 10$. Then we start by making a table $T(i)$ of the number of pairs $a,b$ with $1\leq ab\leq i$, for $1\leq i\leq 10$; this looks like $[1, 3, 5, 8, 10, 14, 16, 20, 23, 27]$. (This table is buildable in $O(i^2)$ time through several means and I suspect it's buildable much faster than that in practice if you wanted to dig into the appropriate algorithms.) Now, for each $n_1$ we can compute $T(\lfloor\frac{10}{n_1}\rfloor)-T(\lceil\frac5{n_1}\rceil-1)$ (taking $T(0)=0$); this represents how many pairs $n_2,n_3$ fit the product within the bounds for that $n_1$. This table will look like $[19, 7, 4, 2, 3, 1, 1, 1, 1, 1]$. Then sampling from this table is a matter of computing the CDF (which here would be $[19, 26, 30, 32, 35, 36, 37, 38, 39, 40]$), computing an $r$ uniformly in $[1\ldots 40]$ and doing the reverse lookup to find the corresponding value of $n_1$.

5
On

There are only 871 such tuples of positive integers $x,y,z$ that $50<xyz<100$.

I believe you can precalculate them and select randomly from the list.

1
On

There's a really simple method that achieves uniformity automatically, but is likely to be somewhat slow, namely "rejection sampling":

the three numbers must all be between $1$ and $100$, so do the following:

  1. Select three numbers, $p,q,r$, uniformly between $1$ and $100$. Compute their product $K = pqr$.

  2. If $50 \le K \le 100$, return the triple $(p, q, r)$.

  3. Otherwise, return to step 1.

Now as @VasilyMitch has observed, there are about $1000 = 10^3$ "good" triples, but there $10^6$ possible triples, so on average, you'll need to perform step 1 about $1000$ times to succeed at step 2.

Fortunately, computer time is cheap, and debugging programs is hard, so writing simple code like this can be a win (depending on your circumstances).

You can, of course, do much better with a nearly equivalent strategy: rather than sampling uniformly from a cube that contains all the good triples, you could sample uniformly from some smaller shape --- an axis-aligned tetrahedron, for instance, and probably reduce the number of wasted attempts by a factor of 6. Then again, you'd be increasing the complexity and maintainance cost for your program as well.

1
On

This isn't an answer per se, but it's too long for a comment. Rejection methods could work well for choosing $n$-tuples whose product is less than $b_{max}$ if those numbers are small. However, if $b_{max}$ or $n$ get large, the efficiency of the method drops dramatically. We can estimate the number of $n$-tuples whose product is between $b_{max}$ and $b_{min}$ by the volume of the sets of $n$ real numbers greater than $1$ whose product is between those values. This can be done with an iterated integral$^1$: \begin{multline} \int_1^{b_{max}}\int_1^{b_{max}/x_1}\int_1^{b_{max}/x_1x_2}...\int_1^{b_{max}/x_1x_2...x_{n-1}}\int_{b_{min}/x_1x_2...x_n}^{b_{max}/x_1x_2...x_n}dx_1dx_2...dx_n\\ = \frac{b_{max}-b_{min}}{(n-1)!}\ln(b_{max})^{n-1}. \end{multline} Meanwhile, the total number of $n$-tuples with maximum value $b_{max}$ is, of course, $b_{max}^n$. So if you pick a random $n$-tuple, the probability of it being valid is approximately $$ P(b_{min}\le x_1x_2...x_n\le b_{max}) \approx\frac{1-\frac{b_{max}}{b_{min}}}{(n-1)!}\left[\frac{\ln(b_{max})}{b_{max}}\right]^{n-1} $$ As you can see, if $n$ or $b_{max}$ is big, this gets small very quickly. Using the parameters you've mentioned in your question and comments, for $b_{max} = 100$, $n = 3$ it's about $10^{-3}$. For $b_{max} = 10^5$, $n=3$ it's about $10^{-8}$. And for $b_{max} = 10^5$, $n= 6$ it's about $10^{-22}$.

$^1$Strictly speaking the lower limit of the innermost integral should be $\max(b_{min}/x_1x_2...x_n,1)$. But this is only an approximation anyways, and at any rate it only makes the probabilities worse.