Calculating the gradient of residual sum of squares via product rule

Question

Calculating the gradient of residual sum of squares via product rule

68 Views Asked by Bumbble Comm At 06 Apr 2026 - 2:41

I know that $\nabla_{\textbf{w}}\text{RSS}(\textbf{w})=\nabla((y-\text{H}\textbf{w})^{T}(y-\text{H}\textbf{w}))=-2\text{H}^{T}(y-\text{H}\textbf{w})$. However, I am having trouble deriving this via the product rule. (I know there are other ways to do this but I am particularly interested in this method)

Here is what I have so far

$$\nabla((y-\text{H}\textbf{w})^{T}(y-\text{H}\textbf{w}))\\ =\nabla((y^{T}-\textbf{w}^T\text{H}^T)(y-\text{H}\textbf{w})) \\ =(-\text{H})(y-\text{H}\textbf{w}) +(y^{T}-\textbf{w}^T\text{H}^T)(-\text{H}) \\ =(-\text{H})(y-\text{H}\textbf{w}) +(y-\text{H}\textbf{w})^{T}(-\text{H}) $$

This is not equal to the proven gradient, nor is it even defined. Where have I gone wrong?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 11 Jan 2024 - 3:34

There is no need for the product rule, it is a quadratic and the gradient can be computed directly.

Let $f(w) = (y-Hw)^T (y-Hw)$, then $f(w+h) = f(w) + (-2H^T (y-Hw))^T h + h^T H^T H h$ from which it follows that $\nabla f(w) = -2H^T (y-Hw)$.

**Bumbble Comm** · Accepted Answer

Multivariate gradient can be confusing, especially regarding when we should transpose the vector(s). If you have just started learning, I recommend to check using the indices notation:

$$ RSS= \left( y_{i}-H_{ip}w_{p} \right) \left( y_{i}-H_{iq}w_{q} \right) $$

Here we use the Einstein notation: when two matching indices are present in a product, it means sum of products over all possible values of that index. Now the gradient:

$$ \begin{aligned} \left(\nabla RSS\right)_{j} &= \frac{\partial}{\partial w_{j}}RSS \\ &= -H_{ij}\left(y_{i}-H_{iq}w_{q}\right)- \left(y_{i}-H_{ip}w_{p}\right)H_{ij} \\ &= -2H_{ij}\left(y_{i}-H_{ip}w_{p}\right) \end{aligned} $$

Now we convert it back to matrix - vector notation:

$$ \nabla RSS=-2\mathbf{H}^{\top}\left(\mathbf{y}-\mathbf{H}\mathbf{w}\right) $$

So your mistake is in the second term of LHS on the second row, you should transpose that term and you will get the correct expression.

Calculating the gradient of residual sum of squares via product rule

There are 2 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in FUNCTIONS

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions