Intuitive Explanation of Bessel's Correction

28.5k Views Asked by At

When calculating a sample variance a factor of $N-1$ appears instead of $N$ (see this link ). Does anybody have an intuitive way of explaining this to students who need to use this fact but maybe haven't taken a statistics course?

4

There are 4 best solutions below

5
On

http://en.wikipedia.org/wiki/Bessel%27s_correction

The Wikipedia article linked to above has a section (written by me) titled "The source of the bias". It explains it via a concrete example.

But note also that correcting for bias, when it can be done, is not always a good idea. I wrote this paper about that: http://arxiv.org/pdf/math/0206006.pdf

10
On

To me, the main idea is that the sample mean is not the distribution (or population) mean. The sample mean is "closer" to the sample data than the distribution mean, so the variance computed is smaller.

Suppose the distribution mean is $m_d$. The sum of $n$ variates (the sample) is $n m_s$, where $m_s$ is the sample mean. Recall that the mean and variance of a sum of variates are the sum of the means and sum of the variances of the variates. That is, the distribution mean of the sum of $n$ variates is $n m_d$ and the distribution variance of the sum of $n$ variates is $n v_d$. In other words, $$ \mathrm{E}[(n m_s-n m_d)^2]=n v_d $$ or equivalently, $$ \mathrm{E}[(m_s-m_d)^2]=\frac{1}{n}v_d $$ Let us compute the expected sample variance as $$ \begin{align} &\mathrm{E}[v_s]\\ &=\mathrm{E}\left[\frac{1}{n}\sum_{k=1}^n(x_k-m_s)^2\right]\\ &=\mathrm{E}\left[\frac{1}{n}\sum_{k=1}^n\left((x_k-m_d)^2+2(x_k-m_d)(m_d-m_s)+(m_d-m_s)^2\right)\right]\\ &=\mathrm{E}\left[\frac{1}{n}\sum_{k=1}^n\left((x_k-m_d)^2+2(m_s-m_d)(m_d-m_s)+(m_d-m_s)^2\right)\right]\\ &=\mathrm{E}\left[\frac{1}{n}\sum_{k=1}^n\left((x_k-m_d)^2-(m_d-m_s)^2\right)\right]\\ &=v_d-\frac{1}{n}v_d\\ &=\frac{n{-}1}{n}v_d \end{align} $$ Thus, $$ v_d=\frac{n}{n{-}1}\mathrm{E}[v_s] $$ This is why, to estimate the distribution variance, we multiply the sample variance by $\frac{n}{n{-}1}$. Thus, it appears as if we are dividing by $n{-}1$ instead of $n$.

0
On

The sum $\sum (x_i - a)^2$ is minimized when $a$ is the average of the $x_i$'s. The proof is a simple exercise in algebra (write sum as quadratic in $a$) or calculus (differentiate to get $\sum (x_i - a) = 0 $).

Therefore, $\sum (x_i - \overline{x})^2 \leq \sum (x_i - \mu)^2$. The same inequality holds with averages in place of sums.

The average of the squared $(x_i - \mu)$'s is an unbiased estimate of the variance, since each term has that variance as its expected value. Replacing $\mu$ by $\overline{x}$ produces a smaller estimate of the variance, and the expected value of that estimate is lower than the expected value of the unbiased estimator with $\mu$. The latter is the variance, so the expression with $1/N$ will (on average) underestimate the variance.

To summarize: the use of $\overline{x}$ as a surrogate for $\mu$ causes a downward bias in estimating variance by $\sum (x_i - \overline{x})^2 / N$.

2
On

The question refers to "explaining this to students who need to use this fact but maybe haven't taken a statistics course". If they're more advanced than those who will understand the example that I mentioned that does not require algebra beyond expanding $(a+b)^2$, maybe a couple of other points of view are worth looking at.

We can write $$ \begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} = \begin{bmatrix}\overline{x} \\ \vdots \\ \overline{x} \end{bmatrix} + \begin{bmatrix} x_1 - \overline{x} \\ \vdots \\ x_n - \overline{x}\end{bmatrix}, $$ and notice that the two vectors being added are the orthogonal projections of the sum onto spaces of dimensions $1$ and $n-1$. The expected value of the first summand is $\mu$ times a column of $1$s, and the expected value of the second summand is $0$. So rotate the coordinate system so that this becomes $$ \begin{bmatrix}u_1 \\ \vdots \\ u_n\end{bmatrix} = \begin{bmatrix} u_1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix}. $$ The expected value of the first entry in the first summand is $\mu\sqrt{n}$. The expected value of every entry in the second summand is $0$. The expected value of the square of the norm of the second vector is $n-1$ times the expected value of the square of any of its entries. That's where the $n-1$ comes from. Notice that $$ \underbrace{\sum_{i=1}^n (x_i - \overline{x})^2}_{n\text{ terms}} = \underbrace{\sum_{i=2}^n u_i^2}_{n-1\text{ terms}}. $$

If students do know some probability theory, the above can also explain why $\sum_{i=1}^n (X_i - \overline{X})^2/\sigma^2$ has a chi-square distribution with $n-1$ degrees of freedom when there are certain assumptions about normal distribution and about independence. (I use capital $X$ this time since it's a random variable.) It can also explain why $\overline{X}$ is actually independent of that chi-square random variable.

Another thing that is sometimes useful in thinking about this topic is the algebraic identity $$ \sum_{i=1}^n (x_i - \mu)^2 = n(\overline{x} - \mu)^2 + \sum_{i=1}^n (x_i - \overline{x})^2 \text{ where } \overline{x} = \frac{x_1+\cdots+x_n}n. $$ Clearly this implies that $$ \sum_{i=1}^n (x_i - \mu)^2 \ge \sum_{i=1}^n (x_i - \overline{x})^2 $$ with equality if and only if $\overline{x}=\mu$. This is of course the same thing as what was used in the concrete example in the Wikipedia article linked in my earlier answer, but stated in a way that will be understood by students who know more algebra than the expansion of $(a+b)^2$.