Matrix regression proof that $\hat \beta = (X' X)^{-1} X' Y = {\hat \beta_0 \choose \hat \beta_1}$

2.2k Views Asked by At

Matrix regression proof that $\hat \beta = (X' X)^{-1} X' Y = {\hat \beta_0 \choose \hat \beta_1} $

where $\beta$ is the least square estimator of $\hat\beta$ of $\beta$

attempt

So I know ${\hat \beta_0 \choose \hat \beta_1} = {\overline{Y} - \hat \beta_1 \overline{X} \choose \frac{\sum_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{X})}{\sum_{i=1}^{n}(X_i - \overline{X})^2}}$

Not really sure how to start as I don't know what formulas there are to reduce any of this. And if this was answered elsewhere please duplicate I was trying to search but couldn't

2

There are 2 best solutions below

2
On BEST ANSWER

In a slight variant on @MinusOne-Twelfth's comment,$$\frac{\partial}{\partial\beta_i}(y-X\beta)_j=-X_{ji}\implies\frac{\partial}{\partial\beta_i}\sum_j(y-X\beta)_j^2=2\sum_jX_{ij}^T(X\beta-y)_j=2(X^\prime X\beta-X^\prime y)_i.$$Setting this to $0$ for all $i$,$$X^\prime X\beta=X^\prime y\implies\beta=(X^\prime X)^{-1}X^\prime y.$$

0
On

Our goal is to minimize $$ f(\beta) = \frac12 \| X \beta - Y \|^2. $$ Notice that $f = g \circ h$, where $h(\beta) = X \beta - Y$ and $g(u) = \frac12 \| u \|^2$. The derivatives of $g$ and $h$ are given by $$ g'(u) = u^T, \quad h'(\beta) = X. $$ By the chain rule, we have \begin{align} f'(\beta) &= g'(h(\beta)) h'(\beta) \\ &= (X \beta - Y)^T X. \end{align} The gradient of $f$ is $$ \nabla f(\beta) = f'(\beta)^T = X^T( X \beta - Y). $$ Setting the gradient of $f$ equal to $0$, we discover that $$ X^T X \beta = X^T Y. $$