Identifying the distribution of the least squares estimate with linear regression

72 Views Asked by At

The model is set up as follows: Random variables $Y_1,Y_2,...,Y_n$ are given by $Y_i=\alpha+\beta x_i+\epsilon_i$ where $\alpha,\beta$ and $x_1,...,x_n$ are constants with $\sum_{i=1}^n x_i=0$ and $\epsilon_1,...,\epsilon_n$ are independent $N(0,\sigma^2)$ random variables. We define $$\hat{\alpha}=\frac{1}{n}\sum_{i=1}^n Y_i$$ and $$\hat{\beta}=\Big(\sum_{i=1}^n x_iY_i\Big)/\Big(\sum_{i=1}^nx_i^2\big)$$

The question is then to choose the first and second rows of an $n\times n$ orthogonal matrix $A$ appropriately so that $Z=AY$ satisfies $$R^2=\sum_{i=1}^n (Y_i-\hat{\alpha}-\hat{\beta}x_i)^2=\sum_{i=3}^n Z_i^2$$ and hence deduce the joint distribution of $\hat{\alpha},\hat{\beta}$ and $R^2/\sigma^2$.

So far, I have found the joint distribution of $\hat{\alpha}$ and $\hat{\beta}$, and computed their covariance matrix.

I then chose the first two rows of $A$ to be $\big(\frac{1}{\sqrt{n}},..., \frac{1}{\sqrt{n}}\big)$ and $\Big(\frac{x_1}{\sqrt{\sum x_i^2}},..., \frac{x_n}{\sqrt{\sum x_i^2}}\Big)$, respectively, which gave me the desired relation $R^2=\sum_{i=1}^n (Y_i-\hat{\alpha}-\hat{\beta}x_i)^2=\sum_{i=3}^n Z_i^2$.

This is when I get confused. Because $A$ is orthogonal, $Z_1,...,Z_n$ are independent and identically distributed $N(0,\sigma^2)$ random variables. Then $R^2/\sigma^2$ is $\chi_{n-2}^2$, and as this is independent from the $\hat{\alpha},\hat{\beta}$ (since the $Z_i$ are independent) we're done. But multiplying out the first row of $A$ with $Y$ in fact gives $Z_1=\sqrt{n}\hat{\alpha}$, which has mean $\alpha$, not $0$, which obviously does not make sense.

Should I choose my rows of $A$ slightly differently? The only thing is that choosing them in the way I have makes it clear that $A$ is in fact orthogonal, and it is really straightforward to deduce the relation $R^2=\sum_{i=1}^n (Y_i-\hat{\alpha}-\hat{\beta}x_i)^2=\sum_{i=3}^n Z_i^2$.

1

There are 1 best solutions below

0
On

A year late, but: Your error is in concluding that

$Z_1,\ldots,Z_n$ are independent and identically distributed $N(0,\sigma^2)$ random variables.

Given that $A$ is orthogonal and that $\operatorname{Cov}(Y)=\sigma^2I$ you can assert that $Z:=AY$ has covariance matrix $$\operatorname{Cov}(Z)=\operatorname{Cov}(AY)=A\operatorname{Cov}(Y)A^T=A(\sigma^2I)A^T=\sigma^2I.$$ Since the joint distribution of the $Y$'s is multivariate normal, it follows that the $Z$'s are independent normal with common variance $\sigma^2$. However, this doesn't say anything about their means.