Updating standard deviation without set

64 Views Asked by At

Say for instance that I have this set:
16, 76, 48, 44, 4, 2, 94, 87, 10, 22

And I calculate the standard deviation for it:

  1. Get the mean (we'll call it "m")
  2. For each number: (nr - m)^2 (we'll call them "new")
  3. Get the mean of all "new" (we'll call this "nm")
  4. Then take the square root of "nm"

So far so good.
But now, let's say that the set disappears, the paper is lost, or I forget the numbers after 1 week or something.
The only things I have left are "m"(40.3), "nm"(1104.01), how many numbers(10) there were, and the standard deviation(33.226645933648).

And now, I want to add 2 new numbers(33 & 18 for instance) into the calculation to get the "updated standard deviation" using these 2 new numbers, "m", "nm", the amount of numbers, and the standard deviation, since I don't have anything else to go by.

Question: Is it possible to do this? If so, how? Or will it always be stuck on step #2?
So I want to turn the stdev 33.226645933648 into 6.2809234989769.

I saw the answer to this one, but I don't really understand how it would be valid in my case, it still seems like it would be impossible at step #2:
Updating mean value and standard deviation

1

There are 1 best solutions below

1
On BEST ANSWER

You have $n$, $\overline{x}_n=\frac{1}{n} \sum_{i=1}^n x_i$, and $S_n=\left ( \frac{1}{n-1} \sum_{i=1}^n (x_i-\overline{x}_n)^2 \right )^{1/2}$. You want $S_N=\left ( \frac{1}{N-1} \sum_{i=1}^N (x_i-\overline{x}_N)^2 \right )^{1/2}$. Write

$$S_N=\left (\frac{1}{N-1} \sum_{i=1}^N (x_i-\overline{x}_n+\overline{x}_n-\overline{x}_N)^2 \right )^{1/2}.$$

Then

$$(x_i-\overline{x}_n+\overline{x}_n-\overline{x}_N)^2=(x_i-\overline{x}_n)^2+2(x_i-\overline{x}_n)(\overline{x}_n-\overline{x}_N)+(\overline{x}_n-\overline{x}_N)^2.$$

Summing up, and splitting the first two sums, we get

$$S_N = \left ( \frac{1}{N-1} \left ( \sum_{i=1}^n (x_i-\overline{x}_n)^2 + \sum_{i=n+1}^N (x_i-\overline{x}_n)^2 + 2(\overline{x}_n-\overline{x}_N) \sum_{i=1}^n (x_i-\overline{x}_n) + 2 (\overline{x}_n-\overline{x}_N) \sum_{i=n+1}^N (x_i-\overline{x}_n) + N(\overline{x}_n-\overline{x}_N)^2 \right ) \right )^{1/2}.$$

The second and fourth terms can be computed knowing only the new data and the two means. The first term is just $(n-1)S_n^2$. And the third term is just zero. Making those simplifications gives:

$$S_N = \left ( \frac{1}{N-1} \left ( (n-1) S_n^2 + \sum_{i=n+1}^N (x_i-\overline{x}_n)^2 + 2 (\overline{x}_n-\overline{x}_N) \sum_{i=n+1}^N (x_i-\overline{x}_n) + N(\overline{x}_n-\overline{x}_N)^2 \right ) \right )^{1/2}.$$