Generating a probability distribution with desired std.dev

61 Views Asked by At

I'd like to generate a list of numbers for a size $n$ s.t their sum 1 and the sample std.variation is some fixed number $\sigma$. How can I do it ?

1

There are 1 best solutions below

8
On BEST ANSWER

Let $n > 1$ be fixed.

Choose an arithmetic progression of $n$ terms, centered at ${\large{\frac{1}{n}}}$ (i.e., with mean ${\large{\frac{1}{n}}}$).

Leaving the common difference $d$ unknown, compute the sample standard deviation as a function of $d$, set it equal to the target sample standard deviation, and solve for $d$.

Explicitly, define $x_1,...,x_n$ by $$x_k = \frac{1}{n}+d\left(k-\frac{n+1}{2}\right)$$ with $d$ assumed nonnegative, but otherwise left unknown.

Let $s$ be the target sample standard deviation. \begin{align*} \text{Then}\;\;s^2 &= \sum_{k=1}^n \frac {\left(x_k - \frac{1}{n}\right)^2} {n}\\[4pt] &=\frac{d^2}{n}\sum_{k=1}^n\left(k-\frac{n+1}{2}\right)^2\\[4pt] &=\frac{d^2}{n}\sum_{k=1}^n\left(k^2-k(n+1) + \frac{(n+1)^2}{4}\right)\\[4pt] &=\frac{d^2}{n} \left( \left( \sum_{k=1}^n k^2 \right) - \left( (n+1)\sum_{k=1}^n k \right) + \left( \frac{(n+1)^2}{4} \sum_{k=1}^n 1 \right) \right) \\[4pt] &=\frac{d^2}{n} \left( \frac{n(n+1)(2n+1)}{6} - (n+1)\left(\frac{n(n+1)}{2}\right) + \left(\frac{(n+1)^2}{4}\right) n \right) \\[4pt] &= d^2\left(\frac{n^2-1}{12}\right)\\[10pt] &\;\;\;\;\;\;\;\;\text{hence}\\[10pt] d &=2s\sqrt{\frac{3}{n^2-1}}\\[4pt] \end{align*} So that's one way.

Here's another way, similar in spirit, but algebraically a lot simpler . . .

Let $n > 1$ be fixed.

Define $x_1,...,x_n$ by $$ \begin{cases} x_1 = \frac{1}{n}- d\\[4pt] x_k = \frac{1}{n}&\text{if}\;\,1 < k < n\\[4pt] x_n = \frac{1}{n}+ d\\[4pt] \end{cases} $$ with $d$ assumed nonnegative, but otherwise left unknown.

In other words, set all terms equal to the mean except the first and last terms, which are to be placed symmetrically opposite from the mean, at a distance $d$, yet to be determined.

Let $s$ be the target sample standard deviation. \begin{align*} \text{Then}\;\;s^2 &=\frac{2d^2}{n}\\[10pt] \text{hence}\;\;d &=s\sqrt{{\small{\frac{n}{2}}}}\\[4pt] \end{align*} Here's still another, more general way . . .

Assuming the target standard deviation $s$ is positive, fix $n > 1$, and choose $n$ real numbers $w_1,...,w_n$, not all equal, but otherwise arbitrary.

For the sample $w_1,...,w_n$, compute the sample mean $\mu_0$, and the sample standard deviation $\sigma_0$.

Then, letting $x_1,...,x_n$ be given by $$x_k = s\left(\frac{w_k - \mu_0}{\sigma_0}\right)+\frac{1}{n}$$ for $k = 1,...,n$, it follows that the values $x_1,...,x_n$ have sample mean ${\large{\frac{1}{n}}}$ and sample standard deviation $s$.

If instead you want $n$ random values, you can use a pseudo-random number generator to generate $n$ random numbers $w_1,...,w_n$ from a known distribution, with mean $\mu_0$, say, and standard deviation $\sigma_0$, say.

Then, letting $x_1,...,x_n$ be given by $$x_k = \sigma\left(\frac{w_k - \mu_0}{\sigma_0}\right)+\frac{1}{n}$$ for $k = 1,...,n$, you get $n$ pseudo-random numbers $x_1,...,x_n$ from a distribution with mean ${\large{\frac{1}{n}}}$ and standard deviation $\sigma$.