I know that $\nabla_{\textbf{w}}\text{RSS}(\textbf{w})=\nabla((y-\text{H}\textbf{w})^{T}(y-\text{H}\textbf{w}))=-2\text{H}^{T}(y-\text{H}\textbf{w})$. However, I am having trouble deriving this via the product rule. (I know there are other ways to do this but I am particularly interested in this method)
Here is what I have so far
$$\nabla((y-\text{H}\textbf{w})^{T}(y-\text{H}\textbf{w}))\\ =\nabla((y^{T}-\textbf{w}^T\text{H}^T)(y-\text{H}\textbf{w})) \\ =(-\text{H})(y-\text{H}\textbf{w}) +(y^{T}-\textbf{w}^T\text{H}^T)(-\text{H}) \\ =(-\text{H})(y-\text{H}\textbf{w}) +(y-\text{H}\textbf{w})^{T}(-\text{H}) $$
This is not equal to the proven gradient, nor is it even defined. Where have I gone wrong?
Multivariate gradient can be confusing, especially regarding when we should transpose the vector(s). If you have just started learning, I recommend to check using the indices notation:
$$ RSS= \left( y_{i}-H_{ip}w_{p} \right) \left( y_{i}-H_{iq}w_{q} \right) $$
Here we use the Einstein notation: when two matching indices are present in a product, it means sum of products over all possible values of that index. Now the gradient:
$$ \begin{aligned} \left(\nabla RSS\right)_{j} &= \frac{\partial}{\partial w_{j}}RSS \\ &= -H_{ij}\left(y_{i}-H_{iq}w_{q}\right)- \left(y_{i}-H_{ip}w_{p}\right)H_{ij} \\ &= -2H_{ij}\left(y_{i}-H_{ip}w_{p}\right) \end{aligned} $$
Now we convert it back to matrix - vector notation:
$$ \nabla RSS=-2\mathbf{H}^{\top}\left(\mathbf{y}-\mathbf{H}\mathbf{w}\right) $$
So your mistake is in the second term of LHS on the second row, you should transpose that term and you will get the correct expression.