Discrepancy between different methods for finding standard deviation?

155 Views Asked by At

I can't see where I am going wrong.

There are two different ways of writing the standard deviation:

  1. $ \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i – \overline{x})^2}$

  2. $ \sigma = \sqrt{\frac{\sum_{i=1}^N x_i^2}{N} - \overline{x}^2 } $

If you have $N$ numbers with a mean $\overline{x}$, but then you add an additional number to the set which is equal to the mean, what happens to the standard deviation?

Looking at 1:

$N$ increases; $\sum_{i=1}^N (x_i – \overline{x})^2$ stays the same (since $(x-\overline{x}) = 0$);

Therefore $\sigma$ decreases. This makes sense to me.

However, looking at 2:

$\sum_{i=1}^N x_i^2$ increases by the amount $\overline{x}^2$; $N$ increases by one; $\overline{x}^2$ remains the same

Therefore, if $\overline{x}$ is greater than one, $\sigma$ increases?

1

There are 1 best solutions below

0
On BEST ANSWER

The confusion seems to be over operations on fractions. In general, consider the two fractions

$$\frac AB \quad \mbox{and} \quad \frac{A+D}{B+1}.$$

Your question is essentially asking whether the fraction on the right is necessarily greater than the fraction on the left if $D > 1$. The answer is, no, it is not.

For example:

$$\frac{100}{10} > \frac{100+5}{10+1}.$$

As you can see, $5 > 1$, but $\frac{105}{11} < 10.$ In general, it's not just how much you increase each part of the fraction that matters, it's how much you increase each part of the fraction relative to (or as a percentage of) the old value. In the example above, if we add $10\%$ to the denominator but only $5\%$ to the numerator, the value decreases. If you start with a numerator that is much greater than the denominator, you have to add a proportionally larger amount to the numerator than to the denominator just to "keep up."

Applying this to the standard deviation formula, if the $x_i$ are not all exactly the same then the mean value of $x_i^2$ (which is what $\frac{\sum_{i=1}^N x_i^2}{N}$ represents) is greater than the square of the mean value of $x_i$ (which is $\bar x^2$). That's how we find that $\sigma > 0$. Now if we add one more observation to the set, but its value is equal to $\bar x$, we are adding $\bar x^2$ to the sum $\sum_{i=1}^N x_i^2$, but since $\bar x^2$ is less than the mean of $x_i^2$, the new observation "drags down the average". If $\bar x > 1$, then the sum $\sum_{i=1}^N x_i^2$ is already so much greater than $N$ that you would have to add more than $\bar x^2$ to the top to cancel the addition of $1$ to the bottom.