OLS $\hat{\beta}$ definition

Question

OLS $\hat{\beta}$ definition

110 Views Asked by Bumbble Comm At 02 Apr 2026 - 1:59

Why is $\hat{\beta}$ equal to $$\left(\sum_{i=1}^n x_i x_i' \right)^{-1} \sum_{i=1}^n x_i y_1 = \left(\textbf{ X}^{'}\textbf{X} \right)^{-1} \textbf X'\textbf y \ ? $$ I am rather confused.

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2017-12-31 15:39:41

Firstly note that $\textbf X $ is a $n \times (k+1)$ matrix, where $k$ represents the number of predictor variables.

The columns of the $X$ matrix are

$x_j=\begin{pmatrix}{} x_{ 1j} \\ x_{ 2j} \\ \vdots \\ x_{ nj} \end{pmatrix} \quad \forall \ \ 0\leq j\leq k$

where $x_0=\begin{pmatrix}{} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}$

That means that X is equal to

$\textbf X=\begin{pmatrix}{} 1 & x_{ 11} & x_{ 12} &\ldots & x_{ 1k}\\ 1& x_{ 21} & x_{ 22} &\ldots & x_{ 2k}\\ 1& \vdots & \vdots & \ddots & \vdots \\ 1& x_{ n1} & x_{ n2} &\ldots & x_{ nk} \end{pmatrix} $

where $1\leq i \leq n$

Therefore you have to show, for instance, that for $n=2$ and $k=2$

$$\sum_{i=1}^2 x_i\cdot x_i^T=\sum_{i=1}^2 \begin{pmatrix}{} 1 \\ x_{ i1} \\ x_{ i2} \\ \end{pmatrix}\cdot \begin{pmatrix}{} 1 & x_{ i1} & x_{ i2} \end{pmatrix}\qquad (\color{blue}I)$$ is equal to $$ \begin{pmatrix}{} 1 & 1 & \\ x_{ 11}& x_{ 21} \\ x_{ 12}& x_{ 22} \end{pmatrix}\cdot \begin{pmatrix}{} 1 & x_{ 11} & x_{ 12} \\ 1& x_{ 21} & x_{ 22} \end{pmatrix} \qquad (\color{blue}{II})$$ Hint: For the summand $i=1$ you should get at $(\color{blue}I)$

$\begin{pmatrix}{} 1 & x_{ 11} & x_{ 12} \\ x_{ 11}& x_{ 21}^2 & x_{ 11}\cdot x_{ 12}\\ x_{ 12}& x_{ 12}\cdot x_{ 11}& x_{ 12}^2\end{pmatrix} $

$\small{\text{I assume your equality appears in context of ordinary least squares. }}$

**Bumbble Comm** · Answer 2 · 2017-12-31 15:54:48

Taking a different approach to calculus' one can proceed directly from your orthogonality assumptions.

By hypothesis (i.e. our exogeneity assumption), $\mathbb{E}[X'u] = 0$. But $$ u = y- X\beta $$ hence: $$ \mathbb{E}[X'u] = \mathbb{E}[X'y - X'X\beta] = 0 $$ or, re-arranging: $$ \mathbb{E}[X'y] = \mathbb{E}[X'X]\beta $$ and hence our 'true' $\beta$ satisfies: $$ \beta = \mathbb{E}[X'X]^{-1}\mathbb{E}[X'y]. $$ Now by what's called the 'analogy principle' (which is a nice way of saying 'familiarity with the weak law of large numbers/CMT') we can 'operationalize' this definition by the following estimator: $$ \hat{\beta} = \bigg[\sum_i^NX_i' X_i\bigg]^{-1}\sum_i^N X_i' y_i $$

**Bumbble Comm** · Answer 3 · 2017-12-31 19:24:01

As @caradonna states, by Pythagoras, the error vector "$y - X'\hat{\beta}$" must be orthogonal to the subspace spanned by the covariates, $\text{range}(X)$, since otherwise we could improve our fit by adjusting $\beta$ to take advantage of this unused correlation.

This implies that the least squares fit has the following property: $$ X' (y - X\hat{\beta}) = 0,$$ which is equivalent to saying that $$X'y = X'X\hat{\beta}$$ or alternatively that $\hat{\beta} = (X'X)^{-1}X'y$

This of course assumes that $(X'X)^{-1}$ is invertible. If this is not the case, then there exists an infinite number of $\hat{\beta}$ which minimize the squared error. People might then use the pseudo-inverse $(X'X)^+$ to pick the least-norm solution. I.e., pick the $\hat{\beta}$ for which $\|\hat{\beta}\|$ is smallest.

What's interesting here is that the prediction vector $\hat{y} = X \, \hat{\beta} = X\,(X'X)^{-1}X'y$ turns out to be the orthogonal projection of $y$ onto the space spanned by the columns of $X$. The matrix representing this projection is $X(X'X)^{-1}X'$.

OLS $\hat{\beta}$ definition

There are 3 best solutions below

Related Questions in STATISTICS

Related Questions in REGRESSION

Related Questions in ECONOMICS

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions