how to do the derivative of least squares objective in vector form?

73 Views Asked by Bumbble Comm At 05 Apr 2026 - 11:43

Below we have the least squares objective function in vector form. Assume dimension of $x$ is $m \times n$, $W$ is $n \times 1$ and $y$ is $m \times 1$. So $J$ is $1 \times 1$ a scalar. $T$ below is the transpose.

$$ J=\frac{1}{m}(xW -y)^{T}(xW-y) $$

Let $A =xW-y$, then $$ J=\frac{1}{m}A^{T}A$$ $$\frac{dJ}{dA}=\frac{2}{m}A$$ $$\frac{dA}{dW}=x .$$

By chain rule:

$$ \frac{dJ}{dw}=\frac{dJ}{dA}\frac{dA}{dW}=\frac{2}{m}Ax .$$

Obviously if we do this, the dimensions don't match up, since $A$ is $m \times 1$ and $x$ is $m \times n$. Instead, the answer should be $$\frac{dJ}{dw}=\frac{2}{m}x^{T}A .$$

My question is, how do we know the convention in this case? Can someone point me to how we should do the chain rule in this case?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 28 Feb 2021 - 2:30 BEST ANSWER

For differential calculus involving matrices it is more convenient to use differentials instead of derivatives. You can find the rules for differentials in Practical Guide to Matrix Calculus for Deep Learning by Andrew Delong or in the book "Matrix algebra" by Abadir & Magnus.

I will show you how to get the derivative of $J$ using notation and rules from the paper by Delong. Delong's paper doesn't have the chain rule for differentials but it is $$d(f\circ g)(x, dx)=df(g(x),dg(x,dx))$$ You can look Composite function gradient for a proof sketch.

So using the rules (10), (12) from the paper $$dJ(A,dA)=d(\frac{1}{m}A^TA)=\frac{1}{m}dA^TA+\frac{1}{m}A^TdA=\frac{1}{m}(dA)^TA+\frac{1}{m}A^TdA=\frac{2}{m}A^TdA$$ $$dA(x,dx)=d(xW-y)=dxW$$ $$d(A(W),dW)=d(xW-y)=xdW$$ By the chain rule, the rule (6) and the fact that $v^Tu=v\cdot u$ for vectors $v,u$ $$dJ(x,dx)=dJ(A(x),dA(x,dx))=\frac{2}{m}A(x)^TdA(x,dx)=\frac{2}{m}(xW-y)^TdxW=$$ $$=\frac{2}{m}(xW-y)\cdot dxW=\frac{2}{m}(xW-y)W^T\cdot dx$$ $$dJ(W,dW)=J(A(W),dA(W,dW))=\frac{2}{m}(xW-y)\cdot xdW=\frac{2}{m}x^T(xW-y)\cdot dW$$ From these and the rule (17) $$\frac{\partial J}{\partial x}=\frac{2}{m}(xW-y)W^T$$ $$\frac{\partial J}{\partial W}=\frac{2}{m}x^T(xW-y)$$

Bumbble Comm On 28 Feb 2021 - 3:36

Not that it matters mathematically, but using capital letters for vectors and lower case for matrices is unusual. Nevertheless, a few hints:

$\frac{\partial}{\partial W_j}(A^TA)=\frac{\partial}{\partial W_j}\sum_{i} A_i^2 =\sum_{i}2 A_i\frac{\partial A_i}{\partial W_j}=2 A_i \nabla A_{ij} $

Rearrange $A_i \nabla A_{ij}=\nabla A_{ij} A_i=\nabla A^T_{ji} A_i$ to render an equivalent expression that has a more obvious connection to matrix notation.

In matrix form: $ \nabla (A^T A)=2(\nabla A)^TA$

In your problem $A=(xW-y)$ and $\nabla A=x$

$ \nabla A^2=2(\nabla A)^T A=2x^TA$

how to do the derivative of least squares objective in vector form?

There are 2 best solutions below

Related Questions in CALCULUS

Related Questions in DERIVATIVES

Related Questions in VECTORS

Related Questions in MATRIX-CALCULUS

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions