Differentiating the normal equation for linear regression

45 Views Asked by Bumbble Comm At 01 Apr 2026 - 2:47

I am deriving the vector form of the linear regression normal equation, my current working is below.

$$ L(\mathbf{\theta}) = \sum_{n=0}^N(y_n - \hat{y}_n)^2 = (\mathbf{y} - \hat{\mathbf{y}})^T(\mathbf{y} - \hat{\mathbf{y}}) = (\mathbf{y} - \mathbf{X}\mathbf{\theta})^T(\mathbf{y} - \mathbf{X}\mathbf{\theta}) $$ $$ L(\mathbf{\theta}) = (\mathbf{X}\mathbf{\theta})^T\mathbf{X}\mathbf{\theta} - (\mathbf{X}\mathbf{\theta})^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\mathbf{\theta} + \mathbf{y}^T\mathbf{y} = \mathbf{\theta}^T\mathbf{X}^T\mathbf{X}\mathbf{\theta} - \mathbf{\theta}^T\mathbf{X}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\mathbf{\theta} + \mathbf{y}^T\mathbf{y} $$

I then need to differentiate this with respect to $\mathbf{\theta}$. I believe that the following terms differentiate as follows:

$$ \frac{\mathrm{d}}{\mathrm{d}\mathbf{\theta}}\left( - \mathbf{\theta}^T\mathbf{X}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\mathbf{\theta} + \mathbf{y}^T\mathbf{y} \right) = -\mathbf{X}^T\mathbf{y} - \mathbf{y}^T\mathbf{X} + 0 = -2\mathbf{X}^T\mathbf{y} $$

However I am not sure of the logic involved in differentiating the term $\mathbf{\theta}^T\mathbf{X}^T\mathbf{X}\mathbf{\theta}$. My thoughts are that you assume that you can commutate the multiplication of $\mathbf{\theta}^T$ to obtain $\mathbf{X}^T\mathbf{X}\mathbf{\theta}^T\mathbf{\theta}$ and then differentiate to get $2\mathbf{X}^T\mathbf{X}\mathbf{\theta}$, although I am not sure of how to show that this is the case. How would you differentiate this?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 11 Oct 2022 - 1:44 BEST ANSWER

$ \def\t{\theta}\def\p{\partial} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $You'd be better off doing the $\c{\rm differentiation}$ first and the substitution $(\hat y\to\t)$ last: $$\eqalign{ \hat y &= X\t \quad\qiq d\hat y = X\,d\t\\ w &= \hat y-y \qiq dw = d\hat y\\ \c{L} &\c{=} \c{w^Tw} \\ \c{dL} &\c{=} \c{2w^Tdw} \;= 2w^TX\,d\t \;= (2X^Tw)^T d\t \\ \grad{L}{\t} &= 2X^Tw \;= 2X^T(X\t-y) \\ }$$

Differentiating the normal equation for linear regression

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in VECTORS

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions