Equivalence between OLS estimators in matrix and summation form

761 Views Asked by At

I am struggling to reconcile the OLS estimators that I commonly see expressed in matrix and summation form. In matrix form, it takes the following form:

$\hat β$ = $(X'X)^{-1}X'y$

In summation form, it typically looks like the following:

$ \hat{\beta} = \frac{\sum(X_i – \bar{X}) (Y_i – \bar{Y})} {\sum(X_i – \bar{X})^2}$

I am struggling to reconcile these; in my mind, the bottom should be more like this:

$ \hat{\beta} = \frac{\sum(X_iY_i)} {\sum(X_i)^2}$

I am not sure where the means emerge in the matrix notation.

1

There are 1 best solutions below

2
On BEST ANSWER

Assuming $X_i$ is scalar, and $X=(X_1,...,X_n)'$,

$$\hat \beta=(X'X)^{-1}X'Y=\frac{\sum_i X_iY_i}{\sum_i X_i^2}$$

is the OLS estimator for $\beta$ in the equation without the intercept, $Y_i=X_i\beta+\epsilon_i,$ while

$$\hat \beta=\frac{\sum_i (X_i-\bar X)(Y_i-\bar Y)}{\sum_i (X_i-\bar X)^2}$$ is the OLS estimator for $\beta$ in the equation with the intercept, $Y_i=\alpha+X_i\beta+\epsilon_i.$


Update: Note that $(X'X)^{-1}X'Y$ is more general than appears at first sight; indeed, it subsumes the second formula if we allow the regressor to be a column vector that includes a "$1$" for an intercept term. To see this, let $\tilde X_i=(1,X_i)'$ for $X_i$ scalar. Let $X=(\tilde X_1,...,\tilde X_n)'$, which is a $n\times 2$ matrix. Then we have

$$(X'X)^{-1}X'Y=(\sum_i \tilde X_i\tilde X_i')^{-1}\sum_i \tilde X_iY_i\\ =\left[\sum_{i}\left(\begin{array}{cc} 1 & X_{i}\\ X_{i} & X_{i}^{2} \end{array}\right)\right]^{-1}\sum_{i}\left(\begin{array}{c} 1\\ X_{i} \end{array}\right)Y_{i}\\ =\left[n\left(\begin{array}{cc} 1 & \bar{X}\\ \bar{X} & \frac{1}{n}\sum_{i}X_{i}^{2} \end{array}\right)\right]^{-1}n\left(\begin{array}{c} \bar{Y}\\ \frac{1}{n}\sum_{i}X_{i}Y_{i} \end{array}\right)\\ =\frac{1}{\frac{1}{n}\sum_{i}X_{i}^{2}-\bar{X}^{2}}\left(\begin{array}{cc} \frac{1}{n}\sum_{i}X_{i}^{2} & -\bar{X}\\ -\bar{X} & 1 \end{array}\right)\left(\begin{array}{c} \bar{Y}\\ \frac{1}{n}\sum_{i}X_{i}Y_{i} \end{array}\right). $$ The second component is then the estimate of the slope: $$\frac{\frac{1}{n}\sum_{i}X_{i}Y_{i}-\bar{X}\bar{Y}}{\frac{1}{n}\sum_{i}X_{i}^{2}-\bar{X}^{2}}=\frac{\sum_{i}(X_{i}-\bar{X})(Y_{i}-\bar{Y})}{\sum_{i}(X_{i}-\bar{X})^{2}}.$$