Picking numbers from a list with Gaussian distribution (programming implementation)

131 Views Asked by At

If we have values:

$x \in [0, 100]$

I would like to implement a method, where the bigger the value, the less likely it is that it will be picked. (something like a Gaussian curve, but with maximum at the start of the list)

My first idea would be, to make an ordered list:

$\{0,0,...,0,1,1,...1,....,98,98,98,99,99,100\}$

Where the lower the element is, the more duplicates it has. Then I would take the elements out of the list by randomly generating an integer between $0$ and $length(list)$, to pick the element out of the list by it's index. So the lower the number, higher the probability that it will be picked from the list.

However this is a very DIY approach to a problem like that. Is there a more sophisticated mathematical implementation approach to get (something like a Gaussian curve having it's maximum at the start of the list).

(I am programming in Java)

If my initial idea was good, then how many duplicates of each element should I make, to match the Gaussian curve.

EDIT:

The description of the application that I want is that the lower the number, the more probability there is that it will be picked from the list. However I am open to ideas on how to do that more sophistically.

2

There are 2 best solutions below

0
On BEST ANSWER

Another method would be to

  • generate a (pseudo-)random number $U$ uniformly on $(0,1)$
  • take some suitable function $f:(0,1) \to (0,1)$
  • multiply $f(U)$ by $101$ and round down: i.e. $\lfloor 101 f(U)\rfloor$

So it is a matter of a suitable function, a quantile or inverse CDF.

One possibility is $f(u)=1-\sqrt{1-u}$ which would give you something close to a triangular distribution with its peak at $0$. Other functions will produce different distributions

0
On

Your initial idea was on the right track, but let me refine it a bit (and perhaps expand what @Henry was referring to):

  • Make a $n\times 2$ table with the first column being the value you want to extract $v_i$, and the second column its weight, $w_i$ , i.e. the relative amount of times you want to draw this.

  • Now draw a uniform (float) random number $u$, between $0$ and
    $W=\sum_1^n w_i$.

  • To select the value $v$, find the first $i^*$ so that $\sum _1 ^{i^*} w_j > u$.

  • Return $v_{i^*}$.

You can see that the number of times the random number $u$ will produce $i$ is proportional to $w_i$, as expected.

The issue now becomes how to choose a "good" weight. To do this, consider the plot of $v$ versus $w$, Since $v$ are given, you can choose $w$ to be whatever the problem demands, for example, a Gaussian would be, $w \sim e^{-v^2 \over 2\sigma}$ for some standard deviation $\sigma$.