Matrix calculus in multiple linear regression OLS estimate derivation

Question

Matrix calculus in multiple linear regression OLS estimate derivation

3.9k Views Asked by Bumbble Comm At 17 May 2026 - 2:27

The steps of the following derivation are from here

Starting from $y= Xb +\epsilon $, which really is just the same as

$\begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \end{bmatrix} = \begin{bmatrix} 1 & x_{21} & \cdots & x_{K1} \\ 1 & x_{22} & \cdots & x_{K2} \\ \vdots & \ddots & \ddots & \vdots \\ 1 & x_{2N} & \cdots & x_{KN} \end{bmatrix} * \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{K} \end{bmatrix} + \begin{bmatrix} \epsilon_{1} \\ \epsilon_{2} \\ \vdots \\ \epsilon_{N} \end{bmatrix} $

it all comes down to minimzing $e'e$:

$\epsilon'\epsilon = \begin{bmatrix} e_{1} & e_{2} & \cdots & e_{N} \\ \end{bmatrix} \begin{bmatrix} e_{1} \\ e_{2} \\ \vdots \\ e_{N} \end{bmatrix} = \sum_{i=1}^{N}e_{i}^{2} $

So minimizing $e'e'$ gives us:

$min_{b}$ $e'e = (y-Xb)'(y-Xb)$

$min_{b}$ $e'e = y'y - 2b'X'y + b'X'Xb$

(*) $\frac{\partial(e'e)}{\partial b} = -2X'y + 2X'Xb \stackrel{!}{=} 0$

$X'Xb=X'y$

$b=(X'X)^{-1}X'y$

I'm pretty new to matrix calculus, so I was a bit confused about (*).

In step (*), $\frac{\partial(y'y)}{\partial b} = 0$, which makes sense. And then $\frac{\partial(-2b'X'y)}{\partial b} = -2X'y$, but why exactly is this true? If it were $\frac{\partial(-2b'X'y)}{\partial b'}$, then that would make perfect sense to me. Is taking the partial derivative with respect to $b$ the same as taking the partial derivative with respect to $b'$?

Similarly, $\frac{\partial(b'X'Xb)}{\partial b} = X'Xb$ Why is this true? Shouldn't it be $= b'X'X$?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 14 Oct 2016 - 4:27

Consider the full matrix case of the regression $$\eqalign{ Y &= XB+E \cr E &= Y-XB \cr }$$ In this case the function to be minimized is $$\eqalign{f &= \|E\|^2_F = E:E}$$ where colon represents the Frobenius Inner Product.

Now find the differential and gradient $$\eqalign{ df &= 2\,E:dE \cr &= -2\,E:X\,dB \cr &= 2\,(XB-Y):X\,dB \cr &= 2\,X^T(XB-Y):dB \cr\cr \frac{\partial f}{\partial B} &= 2\,X^T(XB-Y) \cr }$$ Set the gradient to zero and solve $$\eqalign{ X^TXB &= X^TY \cr B &= (X^TX)^{-1}X^TY \cr }$$ This result remains valid when $B$ is an $(N\times 1)$ matrix, i.e. a vector.

The problem is that, in the vector case, people tend to write the function in terms of the transpose product instead of the inner product, and then fall into rabbit holes concerning the details of the transpositions.

**Bumbble Comm** · Accepted Answer

This is not exaclty a proof but rather a way to think about it.

You are trying to minimize a scalar function $F(b)$. Now use the implicit derivative:

$$dF=d(y'y)-2d(b'X'y)+d(b'X'Xb)=-2db'X'y+db'X'Xb+b'X'Xdb.$$

Now transpose the last expression (which is a scalar) and factor $db'$.

$$dF=2db'(-X'y+X'Xb)$$

So the gradient of $F(b)$ is $2(-X'y+X'Xb)$. Set this to zero and solve for $b$. This procedure is sometimes also called the external definition of the gradient.

Matrix calculus in multiple linear regression OLS estimate derivation

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions