Say for instance that I have this set:
16, 76, 48, 44, 4, 2, 94, 87, 10, 22
And I calculate the standard deviation for it:
- Get the mean (we'll call it "m")
- For each number: (nr - m)^2 (we'll call them "new")
- Get the mean of all "new" (we'll call this "nm")
- Then take the square root of "nm"
So far so good.
But now, let's say that the set disappears, the paper is lost, or I forget the numbers after 1 week or something.
The only things I have left are "m"(40.3), "nm"(1104.01), how many numbers(10) there were, and the standard deviation(33.226645933648).
And now, I want to add 2 new numbers(33 & 18 for instance) into the calculation to get the "updated standard deviation" using these 2 new numbers, "m", "nm", the amount of numbers, and the standard deviation, since I don't have anything else to go by.
Question: Is it possible to do this? If so, how? Or will it always be stuck on step #2?
So I want to turn the stdev 33.226645933648 into 6.2809234989769.
I saw the answer to this one, but I don't really understand how it would be valid in my case, it still seems like it would be impossible at step #2:
Updating mean value and standard deviation
You have $n$, $\overline{x}_n=\frac{1}{n} \sum_{i=1}^n x_i$, and $S_n=\left ( \frac{1}{n-1} \sum_{i=1}^n (x_i-\overline{x}_n)^2 \right )^{1/2}$. You want $S_N=\left ( \frac{1}{N-1} \sum_{i=1}^N (x_i-\overline{x}_N)^2 \right )^{1/2}$. Write
$$S_N=\left (\frac{1}{N-1} \sum_{i=1}^N (x_i-\overline{x}_n+\overline{x}_n-\overline{x}_N)^2 \right )^{1/2}.$$
Then
$$(x_i-\overline{x}_n+\overline{x}_n-\overline{x}_N)^2=(x_i-\overline{x}_n)^2+2(x_i-\overline{x}_n)(\overline{x}_n-\overline{x}_N)+(\overline{x}_n-\overline{x}_N)^2.$$
Summing up, and splitting the first two sums, we get
$$S_N = \left ( \frac{1}{N-1} \left ( \sum_{i=1}^n (x_i-\overline{x}_n)^2 + \sum_{i=n+1}^N (x_i-\overline{x}_n)^2 + 2(\overline{x}_n-\overline{x}_N) \sum_{i=1}^n (x_i-\overline{x}_n) + 2 (\overline{x}_n-\overline{x}_N) \sum_{i=n+1}^N (x_i-\overline{x}_n) + N(\overline{x}_n-\overline{x}_N)^2 \right ) \right )^{1/2}.$$
The second and fourth terms can be computed knowing only the new data and the two means. The first term is just $(n-1)S_n^2$. And the third term is just zero. Making those simplifications gives:
$$S_N = \left ( \frac{1}{N-1} \left ( (n-1) S_n^2 + \sum_{i=n+1}^N (x_i-\overline{x}_n)^2 + 2 (\overline{x}_n-\overline{x}_N) \sum_{i=n+1}^N (x_i-\overline{x}_n) + N(\overline{x}_n-\overline{x}_N)^2 \right ) \right )^{1/2}.$$