Sample variance bias and degrees of freedom

263 Views Asked by At

I have been researching the reason why a sample variance should be divided by $n-1$ rather than $n$ in order to compute an unbiased sample variance (i.e. Bessel's Correction):

$$ s^2 = \frac 1 {n-1}\sum\limits_{i=1}^n(x_i - \bar{x})^2 $$

Algebraically, I understand the proof in the Wikipedia link above. However, while researching the topic I continue to see people referencing $n-1$ degrees of freedom when referring to Bessel's Correction. Can anyone explain to me how these two concepts are related?

1

There are 1 best solutions below

0
On

$$ \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} \bar x \\ \vdots \\ \bar x \end{bmatrix} + \begin{bmatrix} x_1-\bar x \\ \vdots \\ x_n - \bar x \end{bmatrix} \tag 1 $$ The first term on the right in $(1)$ satisfies a linear constraint that all of its entries must be equal; the second satisfies a linear constraint that the sum of its entries is $0.$ Thus if you know one entry of the first term and you know the constraint then you know all the entries, and if you know all but one entry of the second term and you know the constraint, then you know all of the entries. Thus the first has $1$ degree of freedom and the second has $n-1.$

The first term on the right in $(1)$ is $(\bar x \sqrt n\,)\cdot [1,\ldots,1]^T/\sqrt n,$ and $[1,\ldots,1]^T/\sqrt n$ is a unit vector. The second term is in the space orthogonal to that vector. Consider an orthonormal basis of $\mathbb R^n$ with that unit vector as one member. With respect to that basis the equality $(1)$ becomes $$ \begin{bmatrix} u_1 \\ \vdots \\ \vdots \\ u_n \end{bmatrix} = \begin{bmatrix} u_2 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix} $$ where $u_1 = \bar x \sqrt n$ and $$ (x_1-\bar x)^2 + \cdots + (x_n - \bar x)^2 = u_2^2 + \cdots + u_n^2 $$ and because of the spherical symmetry of this $n$-dimensional normal distribution, the vector $[u_2,\ldots,u_n]^T$ is i.i.d. normal with expected value $0$ and variance $\sigma^2.$ Therefore $$ \frac 1 {\sigma^2} (u_2^2+ \cdots + u_n^2) \sim \chi^2_{n-2}. $$