Creating random numbers matching mean and standard deviation

1.2k Views Asked by At

I know how to compute mean and standard deviation for a given probe. But how do I the opposite? Given is the wanted mean and standard deviation and I want to create the probes.

In other words: What algorithm can I use to create lets say 10.000 random floats that will approximate to a mean of 77 with a standard deviation of 5?

3

There are 3 best solutions below

1
On BEST ANSWER

This is done by a method (or group of methods) called random sampling. For example, let's say we have a normal distribution $(\mu,\sigma)$, the distribution density is:

$$f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

In this case we have an infinite domain. we can do two simple things, the first one would be to cut it. For example let's say we produce random floats uniformly distributed in the interval $(\mu-3\sigma,\mu+3\sigma)$, which covers a big part of the distribution. Then we create another random float uniformly distributed in the $y$ axes of a plot of $f(x)$, between $0$ and $\frac{1}{\sigma\sqrt{2\pi}}$. Then we accept that point if it's below $f(x)$. This works for an arbritary $f(x)$ and is exact if the distribution is defined in a finite interval.

For the specific case of a normal distribution, we can use the fact that under a change of coordinates, we get a simpler distribution (and finite!):

Multiply the distribution with it self and integrate (usually done for integrating exactly the pdf), and suppose $\mu=0$ (you can just shift all samples by $\mu$ at the end:

$$I^2=\frac{1}{2\pi\sigma^2}\int_0^\infty dx\int_0^\infty dye^{-\frac{x^2+y^2}{2\sigma^2}}$$

Now change to polar coordinates, and then $\varphi = \frac{r^2}{2\sigma^2}$, and you get:

$$I^2=\int_0^{2\pi}\frac{d\phi}{2\pi}\int_0^1 \varphi$$

You get two uniform distributions for $\phi$ and $\varphi$ You can just generate a random number $X$ between $0$ and $2\pi$ and another $Y$ between $0$ and $1$, and undo the change of variables:

$$r=\sigma\sqrt{-2\ln Y}$$ $$x=r\cos(X)\qquad y=r\sin(X)$$

And $x$ and $y$ will be two numbers following a normal distribution $(0,\sigma)$, if you want mean $\mu$, just add $\mu$ to all generated numbers.

0
On

As Java-Code:

public static void main(String[] args) {
    Random random = new Random();
    List<Float> lst = new LinkedList<>();
    for (int i = 0; i < 5000; i++) {
        float X = random.nextFloat() * (float) Math.PI * 2;
        float Y = random.nextFloat();
        float r = (float) (5 * Math.sqrt(-2 * Math.log(Y)));
        float x = r * (float) Math.cos(X);
        float y = r * (float) Math.sin(X);
        lst.add(x + 77);
        lst.add(y + 77);
    }

    float mean = (float) lst.stream().mapToDouble(f -> f).sum() / lst.size();
    float dev = (float) Math.sqrt(lst.stream().mapToDouble(f -> (f - mean) * (f - mean)).sum() / lst.size());

    System.out.println(mean);
    System.out.println(dev);
}

Example-Output:

77.03630762634278
4.969276602295466
1
On

There are many ways in which “random” floating-point numbers could be distributed with mean $77$ and standard deviation $5$. @MyUserIsThis gave an example where the numbers are chosen from a normal distribution. Here’s how you could choose them from a uniform distribution.

The continuous uniform distribution of real numbers on the interval $[77-a,77+a]$ has mean $77$ and standard deviation $\frac{a}{\sqrt3}$. If $a=5\sqrt3\approx8.66$, the standard deviation will be $5$, as you want.

If you choose $10,000$ values that are equally likely to be anywhere between $68.34$ and $85.66$, you’ll get the mean and standard deviation required.

You could also choose the values non-randomly and equally spaced: $$x_1=68.34, x_2=68.34+\dfrac{85.66-68.34}{9999}, x_3=68.34+2\cdot\dfrac{85.66-68.34}{9999}, \dots.$$

The standard deviation of these equally spaced $x_i$ is $\sim5.006$.

You could also choose $10,000$ numbers by flipping a fair coin and letting your “random” number be $72$ if you get heads and $82$ if you get tails. Each of these numbers differs from $77$ by exactly $5$, so the standard deviation (the square root of the average square difference from the mean) will necessarily have an expected value of $5$.

Whether one or another distribution is best for your problem depends on more information that you’ve given, but there are no doubt many ways to answer the question you asked.