I am trying to understand how the partial derivative in eq(2) comes on taking partial derivative of eq (1).
$J(\theta) = \textbf{x}^T\textbf{x}- 2\textbf{x}^T\textbf{H}\theta+\theta^T \textbf{H}^T \textbf{H}\theta$ ---(1)
where $T$ denotes Transpose operator, $\textbf{H}$ is $N\times p$ matrix, $\theta$ is $p \times 1$ vector and $\textbf{x}$ is $N \times 1$ vector.
$ \frac{\partial J(\theta)}{\partial \theta} = -2 (\textbf{x}^T\textbf{H})^T+ 2\textbf{H}^T\textbf{H}\theta $ ---(2)
My query is that how in eq (2), Transpose appears in first term and 2 appears in second term.
Note that $\sf Eq(1)$ is really a Frobenius norm $$\eqalign{ \def\L{\left} \def\R{\right} \def\t{\theta} \def\p{\partial} J &= \L\|H\t-x\R\|_F^2 \\ &= \L(H\t-x\R)^T\L(H\t-x\R) \\ }$$ Substuting $\,w=\L(H\t-x\R)\,$ creates an equation that's easy to differentiate $$\eqalign{ J &= w^Tw \\ dJ &= dw^Tw \;+\; w^Tdw \\ &= 2\,w^Tdw \\ &= 2\,w^T\L(H\,d\t\R) \\ &= 2\L(H^Tw\R)^T\,d\t \\ \frac{\p J}{\p \t} &= 2\,H^Tw \\ &= 2\L(H^TH\t-H^Tx\R) \\ }$$