Rolling standard deviations

11.4k Views Asked by At

I am trying to calculate standard deviations on an array of numbers.

My psuedo code looks like this:

deviation = getStandardDeviation(array(32, 47, 42, 45, 80, 90));

In the above example, deviation is equal to 23.26. However, what if I wanted to add an additional number, e.g. 52, to the above calculation, but I no longer had the original array of numbers?

In other words, is there a way for me to calculate the standard deviation using these two variables:

  1. The new number (e.g. 52)
  2. The old standard deviation (e.g. 23.26)

Or would I always need the complete array of numbers?

1

There are 1 best solutions below

7
On BEST ANSWER

These are not enough. You need to keep track only of $n$ (sample size), $S_1=\sum_i a_i$ and $S_2=\sum_i a_i^2$ where $a_i$ is your data (thus you keep sample size, sum of data and sum of squares of data). You can write mean and variance only with this:

$$\hat \mu = \frac{S_1}{n}$$ $$\hat \sigma^2=\frac{S_2}{n}-\hat \mu^2$$

Here $\hat \mu$ is your sample mean, and $\hat \sigma^2$ is your sample variance. The hat only means it's an estimation.

The estimation of standard deviation is $\hat \sigma = \sqrt{\hat \sigma^2}$.


With your data, before update,

$$S_1=32+ 47+ 42+ 45+ 80+ 90=336$$ $$S_2=32^2+ 47^2+ 42^2+ 45^2+ 80^2+ 90^2=21522$$

You have then $n=6$, $S_1=336$, and $S_2=21522$, thus

$$\hat \mu = \frac{336}{6}=56$$ $$\hat \sigma^2=\frac{21522}{6}-56^2=3587-3136=451$$ $$\hat\sigma = \sqrt{451}\simeq 21.2$$

When you update,

  • $n$ becomes $7$,
  • $S_1$ becomes $336+52=388$ and
  • $S_2$ becomess $21522+52^2=24226$

Thus your new mean, variance and standard deviation are

$$\hat \mu_{new} = \frac{388}{7} \simeq 55.4$$ $$\hat \sigma_{new}^2=\frac{24226}{7}-55.4^2 \simeq 388.5$$

$$\hat\sigma_{new}=\sqrt{388.5} \simeq 19.7$$


Also, here the estimation of variance is really the same as $\frac{1}{n}\sum_i (a_i - \hat \mu)^2$. A better estimation is often $\frac{1}{n-1}\sum_i (a_i - \hat \mu)^2$ (notice the $n-1$ in the denominator). One can prove it's unbiased, thus somewhat better. To take this into account, you just have to multiply the estimation of variance by $\frac{n}{n-1}$.

Here it would yield before update:

$$\hat \sigma_{unbiased}^2=\frac{6}{5}\hat \sigma^2 \simeq 541.2$$ $$\hat \sigma_{unbiased}=\sqrt{541.2} \simeq 23.3$$

And after update with your new data (value 52):

$$\hat \sigma_{unbiased}^2=\frac{7}{6}\hat \sigma^2 \simeq 453.3$$ $$\hat \sigma_{unbiased}=\sqrt{453.3} \simeq 21.3$$

Also note that while this $\hat\sigma_{unbiased}^2$ is an unbiased estimator of $\sigma^2$ (without hat, that is, your "true" population value), its square root, that I denote $\hat\sigma_{unbiased}$, is not an unbiased estimator of $\sigma$. The notation is a bit misleading here.