Matrix algebra in least squares regression

90 Views Asked by At

Consider this formula for least-squares regression:

\begin{aligned}L(D,{\vec {\beta }})&=||X{\vec {\beta }}-Y||^{2}\\&=(X{\vec {\beta }}-Y)^{T}(X{\vec {\beta }}-Y)\\&=\color{red}{Y^{T}Y-Y^{T}X{\vec {\beta }}-{\vec {\beta }}^{T}X^{T}Y+{\vec {\beta }}^{T}X^{T}X{\vec {\beta }}}\end{aligned}

and then, if we want to find the minima of $L(D,{\vec {\beta }})$: \begin{aligned}{\frac {\partial L(D,{\vec {\beta }})}{\partial {\vec {\beta }}}}&={\frac {\partial \left(Y^{T}Y-Y^{T}X{\vec {\beta }}-{\vec {\beta }}^{T}X^{T}Y+{\vec {\beta }}^{T}X^{T}X{\vec {\beta }}\right)}{\partial {\vec {\beta }}}}\\&=\color{red}{-2Y^{T}X+2{\vec {\beta }}^{T}X^{T}X}\end{aligned}

equating the gradient to 0

\begin{aligned}-2Y^{T}X+2{\vec {\beta }}^{T}X^{T}X=0\\\end{aligned}

we get a solution for the regression coefficients $\vec {\beta }$

\begin{aligned} & Y^{T}X={\vec {\beta }}^{T}X^{T}X\\& X^{T}Y=\color{red}{X^{T}X{\vec {\beta }}}\\& {\vec {\hat {\beta }}}=(X^{T}X)^{-1}X^{T}Y \end{aligned}

I'm familiar with scalar calculus and algebra. However, the above seems to rely on properties and results of matrix algebra and matrix calculus that are specific to:

  • Multiplications that involve transposed matrices
  • The distributive property of matrix products when also involving transposed matrices

I can't find a good reference for these types of properties (for example here).

What's a good text that may provide a summary of these types of useful properties of matrix algebra? I highlighted in red some of the results that I fail to understand / follow.

1

There are 1 best solutions below

0
On

Well, you my try the excellent lectures by Gilbert Strang at MIT OpenCourseWare. Look for the linear algebra course, you will find everything pretty clear.