Calculating the correlation coefficient between least square estimates

908 Views Asked by At

PROBLEM STATEMENT: Consider the following 2-variable linear regression where the error $e_i$ 's are independently and identically distributed with mean $0$ and variance $1$;

$$y_i = α + β(x_i − \bar x) + e_i ,\ i = 1,2,...,n.$$

Let $\hat α$ and $ \hat β$ be ordinary least squares estimates of $α$ and $β$ respectively. What is the correlation coefficient between $\hat α$ and $\hat β$?


MY ATTEMPT: I use the standard optimization technique to minimize the sum of squares of the error terms. By differentiating by $\alpha$ and $\beta$, I find $$\hat \alpha = \bar y,\ \hat \beta = \frac{\sum x_iy_i-n\bar x\bar y}{\sum x_i^2 - n\bar x^2}.$$ I am stuck here. How do I use the fact that $e_i$'s are i.i.d in order to find the correlation coefficient between $\hat \alpha$ and $\hat \beta$? Firstly, I do not think I understand the problem correctly. In order to calculate the correlation coefficient, I must have a set of values of $\hat \alpha$s and $\hat \beta$s. The $e_i$'s are i.i.d random variables each having mean $0$ and variance $1$. Based on the different values that the different $e_i$s take, I solve the minimization problem every time and find that $\hat \alpha$ and $\hat \beta$ are only dependent on $x_i,y_i$ as above and hence always the same. How then do I find the correlation coefficient?

I have knowledge of only the definitions of elementary terms in the topic of regression, and I am self-studying this. I am sure the problem must have a very easy solution as it is meant to be solved in a few minutes with an extremely elementary knowledge of statistics.

2

There are 2 best solutions below

0
On

(Typically in linear regression we condition on $x_i$ and only regard $Y_i$ as random, so in what follows we treat each $x_i$ as constant.)

For this problem it's important to remember that $\text{Cov}(Y_i, Y_j) = 0$ whenever $i \neq j$, otherwise $\text{Cov}(Y_i, Y_i) = \text{Var}(Y_i) = 1$. Also recall that covariance is a linear operation, so for random variables $X, Y$ and $Z$ and constants $a$ and $b$ we can write $\text{Cov}(X, aY + bZ) = a \text{Cov}(X, Y) + b \text{Cov}(X, Z)$. Here is the calculation:

$$ \begin{align} \text{Cov} ( \hat{\alpha}, \hat{\beta}) &= \text{Cov} \left (\bar{Y}, \frac{ \sum_{i=1}^{n} x_i Y_i - n \bar{x} \bar{Y}}{ \sum_{i=1}^{n} x_i^2 - n \bar{x}^2} \right ) \\ &= \frac{1}{n (\sum_{i=1}^{n} x_i^2 - n \bar{x}^2)} \sum_{j=1}^{n} \text{Cov} \left (Y_j, \sum_{i=1}^{n} x_i Y_i - n \bar{x} \bar{Y} \right ) \\ &= \frac{1}{n (\sum_{i=1}^{n} x_i^2 - n \bar{x}^2)} \sum_{j=1}^{n} \left [ \sum_{i=1}^{n} x_i \text{Cov} \left ( Y_j, Y_i \right ) - n \bar{x} \text{Cov} \left ( Y_j, \bar{Y} \right ) \right ] \\ &= \frac{1}{n(\sum_{i=1}^{n} x_i^2 - n \bar{x}^2)} \sum_{j=1}^{n} \left [ x_j - \bar{x} \sum_{i=1}^{n} \text{Cov}(Y_j, Y_i) \right ] \\ &= \frac{1}{n(\sum_{i=1}^{n} x_i^2 - n \bar{x}^2)} \sum_{j=1}^{n} \left ( x_j - \bar{x} \right ) \\ &= 0 . \end{align} $$

which means the regression coefficients are uncorrelated. (This happens whenever the predictors have been centered by subtracting off their mean.)

0
On

Let's see if some cleaner notation can capture the gist of it. Define:

$$ J=\begin{bmatrix} y_1\\ y_2\\ \vdots \\ y_n\\ \end{bmatrix},\space J=\begin{bmatrix} 1\\ 1\\ \vdots \\ 1\\ \end{bmatrix},\space X=\begin{bmatrix} x_1-\bar{x}\\ x_2-\bar{x}\\ \vdots \\ x_n-\bar{x}\\ \end{bmatrix} $$

Please note the key here is $J\perp X$ - can't emphasize this enough.

Therefore, we have $\hat{\alpha}=(J^\intercal J)^{-1}J^{\intercal}Y$, $\hat{\beta}=(X^\intercal X)^{-1}X^{\intercal}Y$. So,

$$ \begin{align*} Cov(\hat{\alpha},\hat{\beta}) &=(J^\intercal J)^{-1}J^{\intercal}\space Cov(Y, Y)\space X(X^\intercal X)^{-1}\\ &=(J^\intercal J)^{-1}J^{\intercal}\space\sigma^2I\space X(X^\intercal X)^{-1}\\ &=\sigma^2(J^\intercal J)^{-1}(J^\intercal X)(X^\intercal X)^{-1}\\ &=0 \space\space\space (Again, J\perp X) \end{align*} $$

From this, we can draw a more general conclusion:

In linear regression $Y=X_1\beta_1+X_2\beta_2+\epsilon$, we have $\hat{\beta}_1\perp\hat{\beta}_2$ if $X_1\perp X_2$.

Even more general, let $\Sigma=Cov(\epsilon)$ be a general positive definite matrix. In a inner product space defined by $<u,v>=u^\intercal\Sigma v$, the above conclusion still stands, i.e. we have $\hat{\beta}_1^{\intercal}\Sigma \hat{\beta}_2=0$ if $X_1^{\intercal}\Sigma X_2=0$.