How can $n$ variables have $2n$ degrees of freedom?

1.2k Views Asked by At

Formally, if $Y_i\sim \mathrm{Exp}(\lambda)$, then $2\sum_{i=1}^n Y_i \sim \Gamma(n,2)$, which is the chi-squared distribution with $2n$ degrees of freedom.

Intuitively, however, I think of degrees of freedom as the number of variables that are free to vary minus the number of constraints. In other words, the degrees of freedom of a problem, such as a hypothesis test, is like the dimension of the space that the data lives in. And this is, in fact, how most of the webpages I have visited suggest we think about degrees of freedom. But in many tests $2n$ degrees of freedom are used for sample size $n$ without much explanation except the formal one above. An example is a one sided test of the mean of $n$ independent exponential variables or confidence intervals on failure times.

So I'm having a hard time trying to attach a meaning to the extra $n$ degrees. I'm thinking that even though there are $n$ variables there is some underlying geometry that is higher dimensional. There has to be an intuitive explanation for the extra "wiggle room" in these problems.

This question arose from a specific question from a review sheet in my class. The question is if $X_1, X_2, \ldots, X_n$ are iid $N(0,1)$, find a Uniformly Most Powerful test of size $\alpha$ for $H_0: \sigma^2=\sigma_0^2$ vs. $H_1: \sigma^2>\sigma_0^2$. I found that the family of distributions has a monotone likelihood ratio in $Y(X)=\sum X_i^2$ so that a UMP is \begin{align*} T(X)=\begin{cases} 1 & Y(X)>c, \\ 0 & Y(X)<c. \end{cases} \end{align*} Thus, $\alpha=\mathrm{P}_0\left(\frac{Y(X)}{\sigma_0^2}>c/\sigma_0^2\right)$, and since the sum of squares of standard normal variables has a $\chi^2$ distribution with $n$ degrees of freedom, I would have said $c=\sigma_0^2\chi^2_\alpha(n)$. The professor, however says that the notation for this is $c = \sigma_0^2 \chi^2_\alpha(2n)$. The professor is unavailable for comment, but she's used the same review sheet for years so think she means what she says.

Thanks!

1

There are 1 best solutions below

1
On BEST ANSWER

Unfortunatetly the term "degrees of freedom" has more than one meaning. If you have a bunch of data points $(x_i,y_i),\ i=1,\ldots,n$ and you fit a line $y = \hat a +\hat b x$ by least squares (where the "hats" indicate that these are estimates) then the residuals $\hat\varepsilon_i=\hat y_i - y_i$, where $\hat y_i=\hat a + \hat b x$, will satisfy two linear constraints: $\sum_{i=1}^n\hat\varepsilon_i = 0$ and $\sum_{i=1}^n\hat\varepsilon_i x_i = 0$, and one then says the vector of residuals has $n-2$ degrees of freedom.

When one speaks of the chi-square distribution with $k$ degrees of freedom, it means the distribution of $Z_1^2+\cdots+Z_k^2$ where $Z_1, \ldots, Z_k \sim \mathrm{i.i.d.}\ N(0,1)$. That is a different concept. There is a relationship between the two concepts: often a test statistic based on some vector that has $k$ degrees of freedom in the sense defined in the first paragraph above has asymptotically (i.e. as the sample size approaches $\infty$) a chi-square distribution with $k$ degrees of freedom, in the sense defined in this present paragraph.

It can be shown that an exponential distribution with expected value $2$ is a chi-square distribution with $2$ degrees of freedom. That doesn't mean it's a sum of two things; it means it has the same probability distribution as a certain sum of two things.