Why is $\hat{\beta}$ equal to $$\left(\sum_{i=1}^n x_i x_i' \right)^{-1} \sum_{i=1}^n x_i y_1 = \left(\textbf{ X}^{'}\textbf{X} \right)^{-1} \textbf X'\textbf y \ ? $$ I am rather confused.
OLS $\hat{\beta}$ definition
110 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
Taking a different approach to calculus' one can proceed directly from your orthogonality assumptions.
By hypothesis (i.e. our exogeneity assumption), $\mathbb{E}[X'u] = 0$. But $$ u = y- X\beta $$ hence: $$ \mathbb{E}[X'u] = \mathbb{E}[X'y - X'X\beta] = 0 $$ or, re-arranging: $$ \mathbb{E}[X'y] = \mathbb{E}[X'X]\beta $$ and hence our 'true' $\beta$ satisfies: $$ \beta = \mathbb{E}[X'X]^{-1}\mathbb{E}[X'y]. $$ Now by what's called the 'analogy principle' (which is a nice way of saying 'familiarity with the weak law of large numbers/CMT') we can 'operationalize' this definition by the following estimator: $$ \hat{\beta} = \bigg[\sum_i^NX_i' X_i\bigg]^{-1}\sum_i^N X_i' y_i $$
On
As @caradonna states, by Pythagoras, the error vector "$y - X'\hat{\beta}$" must be orthogonal to the subspace spanned by the covariates, $\text{range}(X)$, since otherwise we could improve our fit by adjusting $\beta$ to take advantage of this unused correlation.
This implies that the least squares fit has the following property: $$ X' (y - X\hat{\beta}) = 0,$$ which is equivalent to saying that $$X'y = X'X\hat{\beta}$$ or alternatively that $\hat{\beta} = (X'X)^{-1}X'y$
This of course assumes that $(X'X)^{-1}$ is invertible. If this is not the case, then there exists an infinite number of $\hat{\beta}$ which minimize the squared error. People might then use the pseudo-inverse $(X'X)^+$ to pick the least-norm solution. I.e., pick the $\hat{\beta}$ for which $\|\hat{\beta}\|$ is smallest.
What's interesting here is that the prediction vector $\hat{y} = X \, \hat{\beta} = X\,(X'X)^{-1}X'y$ turns out to be the orthogonal projection of $y$ onto the space spanned by the columns of $X$. The matrix representing this projection is $X(X'X)^{-1}X'$.
Firstly note that $\textbf X $ is a $n \times (k+1)$ matrix, where $k$ represents the number of predictor variables.
The columns of the $X$ matrix are
$x_j=\begin{pmatrix}{} x_{ 1j} \\ x_{ 2j} \\ \vdots \\ x_{ nj} \end{pmatrix} \quad \forall \ \ 0\leq j\leq k$
where $x_0=\begin{pmatrix}{} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}$
That means that X is equal to
$\textbf X=\begin{pmatrix}{} 1 & x_{ 11} & x_{ 12} &\ldots & x_{ 1k}\\ 1& x_{ 21} & x_{ 22} &\ldots & x_{ 2k}\\ 1& \vdots & \vdots & \ddots & \vdots \\ 1& x_{ n1} & x_{ n2} &\ldots & x_{ nk} \end{pmatrix} $
where $1\leq i \leq n$
Therefore you have to show, for instance, that for $n=2$ and $k=2$
$$\sum_{i=1}^2 x_i\cdot x_i^T=\sum_{i=1}^2 \begin{pmatrix}{} 1 \\ x_{ i1} \\ x_{ i2} \\ \end{pmatrix}\cdot \begin{pmatrix}{} 1 & x_{ i1} & x_{ i2} \end{pmatrix}\qquad (\color{blue}I)$$ is equal to $$ \begin{pmatrix}{} 1 & 1 & \\ x_{ 11}& x_{ 21} \\ x_{ 12}& x_{ 22} \end{pmatrix}\cdot \begin{pmatrix}{} 1 & x_{ 11} & x_{ 12} \\ 1& x_{ 21} & x_{ 22} \end{pmatrix} \qquad (\color{blue}{II})$$ Hint: For the summand $i=1$ you should get at $(\color{blue}I)$
$\begin{pmatrix}{} 1 & x_{ 11} & x_{ 12} \\ x_{ 11}& x_{ 21}^2 & x_{ 11}\cdot x_{ 12}\\ x_{ 12}& x_{ 12}\cdot x_{ 11}& x_{ 12}^2\end{pmatrix} $
$\small{\text{I assume your equality appears in context of ordinary least squares. }}$