Let $r$ be the sample correlation for two random variables $X,Y$ based on a random sample $(X_1, Y_1), (X_2, Y_2), \dots (X_n, Y_n)$. According to Wikipedia, under the null hypothesis of zero correlation, the test statistic $t=r \sqrt{\frac{n-2}{1-r ^2}}$ approximately follows a t-distribution with $n-2$ degrees of freedom when the number of observations $n$ is large enough. Is there an easy way to prove this? So far, I tried to rewrite the formula for $r$ in such a way that I can apply the Central Limit Theorem, but I was unable to make something out of it.
Why is the statistic $t=r \sqrt{\frac{n-2}{1-r ^2}} \approx t (n-2)$?
1.2k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
The original result states that if \begin{align*} \begin{pmatrix} X_i \\ Y_i \end{pmatrix} \sim N\left(\mathbf{0}, \begin{pmatrix} \sigma^2_x & \rho \sigma_x \sigma_y \\ \rho \sigma_x \sigma_y & \sigma^2_y \end{pmatrix}\right) \end{align*} and defining $r = \frac{\sum_{i=1}^{n}(X_i - \overline{X})(Y_i - \overline{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \overline{X})^2\sum_{i=1}^{n}(Y_i - \overline{Y})^2}}$, we have that $T =(n-2)\frac{r}{\sqrt{1-r^2}}$ has an exact $t_{n-2}$ distribution (not even approximately). For non-normally distribted $(X_i, Y_i)$, we would have no idea how close or far it would be to a $t$ distribution, but by a CLT argument, would be approximately normal, which would also be approximately $t$-distributed for some sufficiently large degrees of freedom.
Defining $S_{XY} = \sum_{i=1}^{n}X_iY_i$, and simliarly for $S_{XX}, S_{YY}$, and define $S_{YY}^\perp = S_{YY}- \frac{S_{YX}S_{XY}}{S_{XX}}$. A lemma we need is
$S_{YX} \perp S_{YY}^\perp | S_{XX}$
The proof is standard, involving a bunch of normal distribution manipulations and properties of projection matrices. Using this fact, we write \begin{align*} r = \frac{\mathbf{X}^\intercal \mathbf{H} \mathbf{Y}}{\sqrt{\mathbf{X}^\intercal \mathbf{H}\mathbf{X}\mathbf{Y}^\intercal\mathbf{H}\mathbf{Y}}{}} \end{align*} where $\mathbf{H} = \mathbf{I}_{n\times n} - \frac{1}{n}\mathbf{1}\mathbf{1}^\intercal$. Letting $\Gamma$ be the square root of $\mathbf{H}$ (that is, $\Gamma^\intercal \Gamma =\mathbf{H}$) and $\mathbf{W} = \Gamma\mathbf{X}$ and $\mathbf{Z} = \Gamma \mathbf{Y}$, we have \begin{align*} r = \frac{\mathbf{W}^\intercal\mathbf{Z}}{\sqrt{\mathbf{W}^\intercal\mathbf{W}\mathbf{Z}^\intercal\mathbf{Z}}} \end{align*} and so \begin{align*} T^2 = (n-2)^2 \frac{r^2}{1-r^2} = (n-2)^2 \frac{\mathbf{Z}^\intercal \mathbf{W}\mathbf{W}^\intercal\mathbf{Z}}{\mathbf{W}^\intercal\mathbf{W}\mathbf{Z}^\intercal\mathbf{Z} - \mathbf{Z}^\intercal \mathbf{W}\mathbf{W}^\intercal\mathbf{Z}} = (n-2)^2 \frac{S^2_{WZ}}{S_{WW}S^\perp_{ZZ}} \end{align*} So we now know that
- $S_{WZ} \perp S_{ZZ}^\perp | S_{WW}$
- $S_{ZZ}^\perp|S_{WW} \sim \chi^2_{n-2}$
- $S_{WZ}|S_{WW} \sim N(0, S_{WW})$
And so we end up with \begin{align*} T^2|S_{WW} \sim (n-2)^2 \frac{S_{WW} N(0, 1)^2}{S_{WW} \chi^2_{n-2}} = t^2_{n-2} \end{align*} which is independent of $S_{WW}$. Therefore, $T \sim t_{n-2}/\sqrt{n-2}$
In Wackerly et al. this is actually given as a problem 11.55:
Testing the null hypothesis $H_0:\beta_1 = 0$, the statistic
$$T = \frac{\hat \beta_1 - 0}{\frac{S}{\sqrt{S_{xx}}}}$$ possesses a $t$ distribution with $n-2$ degrees of freedom if the null hypothesis is true. Show that the equation for T can also be written as $$T = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}$$
So you want to start from the first equation and convert it to the second.