I am trying to calculate standard deviations on an array of numbers.
My psuedo code looks like this:
deviation = getStandardDeviation(array(32, 47, 42, 45, 80, 90));
In the above example, deviation is equal to 23.26. However, what if I wanted to add an additional number, e.g. 52, to the above calculation, but I no longer had the original array of numbers?
In other words, is there a way for me to calculate the standard deviation using these two variables:
- The new number (e.g.
52) - The old standard deviation (e.g.
23.26)
Or would I always need the complete array of numbers?
These are not enough. You need to keep track only of $n$ (sample size), $S_1=\sum_i a_i$ and $S_2=\sum_i a_i^2$ where $a_i$ is your data (thus you keep sample size, sum of data and sum of squares of data). You can write mean and variance only with this:
$$\hat \mu = \frac{S_1}{n}$$ $$\hat \sigma^2=\frac{S_2}{n}-\hat \mu^2$$
Here $\hat \mu$ is your sample mean, and $\hat \sigma^2$ is your sample variance. The hat only means it's an estimation.
The estimation of standard deviation is $\hat \sigma = \sqrt{\hat \sigma^2}$.
With your data, before update,
$$S_1=32+ 47+ 42+ 45+ 80+ 90=336$$ $$S_2=32^2+ 47^2+ 42^2+ 45^2+ 80^2+ 90^2=21522$$
You have then $n=6$, $S_1=336$, and $S_2=21522$, thus
$$\hat \mu = \frac{336}{6}=56$$ $$\hat \sigma^2=\frac{21522}{6}-56^2=3587-3136=451$$ $$\hat\sigma = \sqrt{451}\simeq 21.2$$
When you update,
Thus your new mean, variance and standard deviation are
$$\hat \mu_{new} = \frac{388}{7} \simeq 55.4$$ $$\hat \sigma_{new}^2=\frac{24226}{7}-55.4^2 \simeq 388.5$$
$$\hat\sigma_{new}=\sqrt{388.5} \simeq 19.7$$
Also, here the estimation of variance is really the same as $\frac{1}{n}\sum_i (a_i - \hat \mu)^2$. A better estimation is often $\frac{1}{n-1}\sum_i (a_i - \hat \mu)^2$ (notice the $n-1$ in the denominator). One can prove it's unbiased, thus somewhat better. To take this into account, you just have to multiply the estimation of variance by $\frac{n}{n-1}$.
Here it would yield before update:
$$\hat \sigma_{unbiased}^2=\frac{6}{5}\hat \sigma^2 \simeq 541.2$$ $$\hat \sigma_{unbiased}=\sqrt{541.2} \simeq 23.3$$
And after update with your new data (value 52):
$$\hat \sigma_{unbiased}^2=\frac{7}{6}\hat \sigma^2 \simeq 453.3$$ $$\hat \sigma_{unbiased}=\sqrt{453.3} \simeq 21.3$$
Also note that while this $\hat\sigma_{unbiased}^2$ is an unbiased estimator of $\sigma^2$ (without hat, that is, your "true" population value), its square root, that I denote $\hat\sigma_{unbiased}$, is not an unbiased estimator of $\sigma$. The notation is a bit misleading here.