Why is sample variance divided by $n-1$ and not $n$

7.6k Views Asked by At

Sample Variance, customarily denoted, $s^2$, as in the formula below, is the average of the squared deviations, except that we divide by $n-1$ instead of $n$.

$$s^2 = \frac{1}{n-1} \sum\limits_{i=1}^{n} (X_i - \overline{X})^2$$

My question is: Why do we divide by $n-1$ instead of $n$?

Does it have anything to with having a "binary" operation, meaning two (bi-nary) operands?

This may be a duplicate question.

3

There are 3 best solutions below

1
On BEST ANSWER

It is natural to wonder why the sum of the squared deviations is divided by $n − 1$ rather than $n$. The purpose in computing the sample standard deviation is to estimate the amount of spread in the population from which the sample was drawn.

Ideally, therefore, we would compute deviations from the mean of all the items in the population, rather than the deviations from the sample mean.

However, the population mean is in general unknown, so the sample mean is used in its place.

It is a mathematical fact that the deviations around the sample mean tend to be a bit smaller than the deviations around the population mean and that dividing by $n − 1$ rather than $n$ provides exactly the right correction.

2
On

The reason sample variance has to divide $n-1$ instead of $n$ because we want sample variance to be an unbiased estimator of the true variance, if the data is coming from a random sample. For derivation of this result, check a standard textbook.

update: It seems neither the answers in this post, nor the answers in the earlier duplicate post gave any derivation of the result. So for completeness I put it at here in case someone ask again in future: $$ E(\frac{1}{n-1}\sum(X_{i}-\overline{X})^{2})=\frac{1}{n-1}(E(\sum X_{i}^{2}-n\overline{X}^{2})=\frac{1}{n-1}E(n(\sigma^{2}+\mu^{2})-\sigma^{2}-n\mu^{2})=\sigma^{2} $$ where we used the fact that $X_{i}$ is assumed to be a random sample, as well as the elementary equality $Var(aX+b)=a^{2}Var(X)$.

0
On

Biased Sample Variance $$ s_n^2 = \frac{1}{n} \sum\limits_{i=1}^{n} (X_i - \overline{X})^2 $$ Unbiased Sample Variance $$ s^2 = \frac{1}{n-1} \sum\limits_{i=1}^{n} (X_i - \overline{X})^2 $$ Dividing by $n-1$ is necessary if you want the unbiased sample variance. The usage of the $n-1$ term is also labeled, Bessel's correction.