Why do statisticians like "$n-1$" instead of "$n$"?

1.1k Views Asked by At

Does anyone have an intuitive explanation (no formulas, just words! :D) about the "$n-1$" instead of "$n$" in the unbiased variance estimator $$S_n^2 = \dfrac{\sum\limits_{i = 1}^n \left(X_i-\bar{X}\right)^2}{n-1}?$$

3

There are 3 best solutions below

2
On BEST ANSWER

(Too long for a comment:)

I can offer an explanation showing that dividing by $n$ would give an underestimation of the variance. The sum of squares $\sum (X_i - \overline{X})^2$, where $\overline{X}$ is the sample mean, is smaller than the sum $\sum (X_i - \mu)^2$ where $\mu$ is the true mean. This is the case since $\overline{X}$ is expected to be ''closer'' to the data points than the true mean since $\overline{X}$ is calculated based on the data. In fact, $\overline{X}$ is the value of $t$ such that the sum $\sum (X_i - t)^2$ is minimized. This shows that we underestimate the variance, so we should divide by something smaller than $n$. To put it even less formal, you try to determine how much your data is spread by comparing the deviations to the sample mean, which is always an underestimation. The sample mean is as close to the data as possible, whereas the true mean will differ more.

The reason that we divide by precisely $n-1$ is that the estimator becomes unbiased (as pointed out in the comments).

0
On

If you knew the mean value of your distribution, the variance should be divided by the number of samples $n$. On the other hand if you extract the mean value from your data, you are fixing a relation on your $n$ samples (their sum is $n\bar X$) so you are left with the equivalent of $n-1$ samples.

1
On

using samples we try to estimate the population mean. But since it is a sample, it will not include the full spectrum of data ; only contains a subset. So squared distance of sample mean from each reading divided by sample size would be lower than that of population. We try to reduce this effect by dividing with a small denominator (N-1) in case of sample variance.