Two versions of calculating the variance are equal

895 Views Asked by At

A course I'm taking defines the mean of a batch of data with the size $n$ like so:

$$\overline{x} = \frac{\sum_{i=1}^n x_i}{n}$$

And the unbiased sample variance:

$$s^2 = \frac{\sum_{i=1}^n (x_i - \overline{x})^2}{(n-1)}$$

And a second, equivalent way of calculating $s^2$:

$$s^2 = (\sum_{i=1}^n x_i^2 - \frac{(\sum_{i=1}^n x_i) ^ 2}{(n -1)}) * \frac{1}{(n -1)}$$

Now, they didn't show any proof for this, so I thought I try my luck. Let's start with the first version and see what we can rewrite:

$$\begin{align} s^2 &= \frac{\sum_{i=1}^n (x_i - \overline{x})}{(n-1)} \\ &= \frac{\sum_{i=1}^n (x_i^2 - 2\overline{x}x_i + \overline{x}^2)}{(n-1)} \\ &= \frac{\sum_{i=1}^n x_i^2}{(n - 1)} - 2\overline{x} \frac{\sum_{i=1}^nx_i}{(n -1)} + \frac{\sum_{i=1}^n \overline{x}^2}{(n-1)} \end{align}$$

Well, this is as far as I came.

Now, the middle term:

$$2\overline{x} \frac{\sum_{i=1}^nx_i}{(n -1)}$$

I could almost rewrite as $2\overline{x}^2$, buuut that doesn't work because the definition of $\overline{x}$ includes dividing by $n$ and NOT dividing by $(n-1)$.

So, in my opinion, the proof can only work if we define:

$$\overline{x} = \frac{\sum_{i=1}^n x_i}{\color{red}{(n - 1)}}$$

Or am I missing something?

2

There are 2 best solutions below

5
On BEST ANSWER

The unbiased sample variance should be $$ \begin{align}s^2 &= \frac{\sum_{i=1}^n (x_i - \overline{x})^2}{n-1} =\frac{\sum_{i=1}^n x_i^2}{n-1}-2\overline{x}\frac{\sum_{i=1}^n x_i}{n-1}+\overline{x}^2\frac{\sum_{i=1}^n 1}{n-1}\\ &=\frac{\sum_{i=1}^n x_i^2}{n-1}-2\overline{x}\frac{n\overline{x}}{n-1}+\overline{x}^2\frac{n}{n-1}\\ &=\frac{\sum_{i=1}^n x_i^2-n\overline{x}^2}{n-1}\\ &=\frac{\sum_{i=1}^n x_i^2-\frac{1}{n}(\sum_{i=1}^n x_i)^2}{n-1}. \end{align} $$

1
On

I think you are missing the fact that sample variance is a biased estimator of variance. This is why we define a new unbiased estimator of sample variance as $S_{n-1}^2$.

$$S_{n-1}^2=\frac{n-1}{n}S_n^2$$

This perhaps explains why you think your mean should be redefined when in fact you are working with an unbiased estimator instead of normal sample variance.

So since all of your denomenators have $n-1$ instead of $n$, which I assume to be what you are looking for, perhaps the problem is that you are confusing $S_{n-1}$ with $S_n$