Degrees of Freedom-the intution behind the concept.

92 Views Asked by At

What are degrees of freedom? I have some general information about this concept, but i would like to know how this concept originated theoretically and why is this concept so imptortant? Why is there a need for this concept? (e.g: in Student's distribution or in χ2 distribution)

Thanks in advance! :)

1

There are 1 best solutions below

1
On BEST ANSWER

Technically, degrees of freedom (DF) is an integer parameter of the chi-squared and Student t distributions. For example, it tells you on what row of a printed probability table you should look for a value that is required for a particular problem. Because the t distribution is defined in terms of the chi-squared distribution, DF means the same thing in both cases. As for intuition, there might be two levels, depending on your mathematical level.

(1) If you know linear algebra, the idea is something like this. View $n$ observations $X_1, X_2, \dots, X_n$ as making a vector in n-dimensional space. The sample mean $\bar X$ is expressed in terms of one linear constraint, which leaves $n-1$ dimensions "free" to express the sample variance $s^2.$

Then $(n-1)s^2/\sigma^2 \sim Chisq(DF=n-1).$ From there one derives that the one-sample t statistic has Student's t distribution with $DF = n - 1.$

(2) At the most elementary level, notice that $s^2 = \frac{\sum_{i-1}^n (X_i - \bar X)^2}{n-1}$ is used for one-sample inference with both chi-squared and t distributions. The DF parameter is simply the denominator of $s^2.$

One way to explain why the denominator is $n-1$ is as follows. The $n$ differences $(X_i - \bar X)$ must sum to 0. Once $n - 1$ of them are known, the value of the last one is already determined (not "free" to assume an independent value of its own).

If, for some reason, you happen to know the population mean $\mu$ instead of having to estimate it with $\bar X,$ then you would estimate the variance with $v = \frac{\sum_{i-1}^n (X_i - \mu)^2}{n}.$ Then $nv/\sigma^2 \sim Chisq(n).$ Some textbooks say that "a degree of freedom is lost" in the process of estimating $\mu$ by $\bar X.$ That is intended as an informal statement, not a mathematically rigorous one.

When you are considering more than one sample the formulas for DF are a little more complicated. For example, in a pooled 2-sample t test DF = $(n_1 - 1) + (n_2 - 1).$ In this case you can consider that you are summing the DF parameters for the two samples.