Calculating the gradient of residual sum of squares via product rule

68 Views Asked by At

I know that $\nabla_{\textbf{w}}\text{RSS}(\textbf{w})=\nabla((y-\text{H}\textbf{w})^{T}(y-\text{H}\textbf{w}))=-2\text{H}^{T}(y-\text{H}\textbf{w})$. However, I am having trouble deriving this via the product rule. (I know there are other ways to do this but I am particularly interested in this method)

Here is what I have so far

$$\nabla((y-\text{H}\textbf{w})^{T}(y-\text{H}\textbf{w}))\\ =\nabla((y^{T}-\textbf{w}^T\text{H}^T)(y-\text{H}\textbf{w})) \\ =(-\text{H})(y-\text{H}\textbf{w}) +(y^{T}-\textbf{w}^T\text{H}^T)(-\text{H}) \\ =(-\text{H})(y-\text{H}\textbf{w}) +(y-\text{H}\textbf{w})^{T}(-\text{H}) $$

This is not equal to the proven gradient, nor is it even defined. Where have I gone wrong?

2

There are 2 best solutions below

0
On BEST ANSWER

Multivariate gradient can be confusing, especially regarding when we should transpose the vector(s). If you have just started learning, I recommend to check using the indices notation:

$$ RSS= \left( y_{i}-H_{ip}w_{p} \right) \left( y_{i}-H_{iq}w_{q} \right) $$

Here we use the Einstein notation: when two matching indices are present in a product, it means sum of products over all possible values of that index. Now the gradient:

$$ \begin{aligned} \left(\nabla RSS\right)_{j} &= \frac{\partial}{\partial w_{j}}RSS \\ &= -H_{ij}\left(y_{i}-H_{iq}w_{q}\right)- \left(y_{i}-H_{ip}w_{p}\right)H_{ij} \\ &= -2H_{ij}\left(y_{i}-H_{ip}w_{p}\right) \end{aligned} $$

Now we convert it back to matrix - vector notation:

$$ \nabla RSS=-2\mathbf{H}^{\top}\left(\mathbf{y}-\mathbf{H}\mathbf{w}\right) $$

So your mistake is in the second term of LHS on the second row, you should transpose that term and you will get the correct expression.

0
On

There is no need for the product rule, it is a quadratic and the gradient can be computed directly.

Let $f(w) = (y-Hw)^T (y-Hw)$, then $f(w+h) = f(w) + (-2H^T (y-Hw))^T h + h^T H^T H h$ from which it follows that $\nabla f(w) = -2H^T (y-Hw)$.