Derivation of normal equation for linear regression parameters

18 Views Asked by At

I'm going through a derivation of the normal equation for the parameter vector $\beta$ of the linear regression model. Given a model $y = X\beta + \epsilon$, where $y$ is the vector of dependent variables, $X$ is the matrix of explanatory variables, $\beta$ is indeed the vector of parameters and $e$ is the vector of the errors. Given $\epsilon = y - X\beta$, they set out to minimise the sum of squared errors, which can be written in vector form as $\langle \epsilon, \epsilon \rangle = \epsilon^T\epsilon$, namely:

$$ (y - X\beta)^T(y - X\beta) \\ = y^Ty - y^T(X\beta) - (X\beta)^Ty + (X\beta)^T(X\beta) \\ = y^Ty - (X\beta)^Ty - (X\beta)^Ty + (X\beta)^T(X\beta).$$

$(X\beta)^Ty$ is the transpose of $y^T(X\beta)$, so I'm wondering why they've taken the tranpose of the second term.