Why are there two formulas for the sample variance?

3.4k Views Asked by At

I'm using an introductory statistics textbook and it mentioned these two formulas for the sample variance:

$s^2 = \frac{\sum(x - \bar{x})^2}{n - 1}$

and

$s^2 = \frac{\sum{}x^2 - \frac{(\sum{}x)^2}{n}}{n-1}$

Can someone explain why this is?

3

There are 3 best solutions below

0
On BEST ANSWER

There is a general "translation" formula,$$\sum(x-a)^2=\sum x^2-2a\sum x+\sum a^2=\sum x^2-2na\overline x+na^2.$$

Now with $a=\overline x$,

$$\sum(x-\overline x)^2=\sum x^2-n\overline x^2.$$

0
On

The two formulas are equivalent: $$\sum(x-\bar x)^2=\sum(x^2-2x\bar x+(\bar x)^2)=\sum x^2-2\bar x\sum x+(\bar x)^2\sum 1$$ We can now use $\sum 1=n$ and $\bar x=\frac 1n \sum x$ to get $$\sum x^2-2\bar x n\bar x+(\bar x)^2 n=\sum x^2-n(\bar x)^2=\sum x^2-\frac 1n(\sum x)^2$$

0
On

They are the same, after you apply a few basic algebra rules. Notice that both formulas are fractions with $n-1$ as denominator, so let's focus on the enumerators:

$$\sum_{i=1}^n(x_i-\bar{x})^2 = \sum_{i=1}^n(x_i^2-2x_i\bar{x}+\bar{x}^2)=\sum_{i=1}^nx_i^2 -\sum_{i=1}^n(2x_i\bar{x})+\sum_{i=1}^n\bar{x}^2$$

In the middle sum, the part $2\bar{x}$ is a constant, it doesn't change with $i$, so can be put before the sum. The latter sum has $n$ times the same number, so we get

$$\sum_{i=1}^nx_i^2 -\sum_{i=1}^n(2x_i\bar{x})+\sum_{i=1}^n\bar{x}^2 = \sum_{i=1}^nx_i^2 -2\bar{x}\sum_{i=1}^nx_i+n\bar{x}^2$$

Now is a good time to remember the definiton of $\bar{x}$ (the arithmetic mean/average of the sample): $$\bar{x} = \frac{\sum_{i=1}^nx_i}n$$

and put this into our formula:

$$ \begin{eqnarray} \sum_{i=1}^nx_i^2 -2\bar{x}\sum_{i=1}^nx_i+n\bar{x}^2 & = & \sum_{i=1}^nx_i^2 -2\frac{\sum_{i=1}^nx_i}n\sum_{i=1}^nx_i+n\left(\frac{\sum_{i=1}^nx_i}n\right)^2\\ & = &\sum_{i=1}^nx_i^2 -\frac2n \left(\sum_{i=1}^nx_i\right)^2 +\frac1n\left(\sum_{i=1}^nx_i\right)^2\\ & = & \sum_{i=1}^nx_i^2 -\frac1n \left(\sum_{i=1}^nx_i\right)^2 \\ \end{eqnarray} $$

Now, at the end, we have the enumerator of the second formula you gave!

Both formulas are useful at different times. The first shows the relation to variance of a random variable, which is defined very similiarly (but with $n$ as denominator). The second is useful for actually calculating the sample variance without needing to go calculate the avergae $\bar{x}$ first.