I am trying to find the differential of $J$ below, in order to eventually find $\partial J/ \partial U$
$J = (I-Ur)^T(I-Ur)$
Let $a= I-Ur$
$\textbf{d}J = \textbf{d}(a^Ta) = \textbf{d}(a^T)a + a^T\textbf{d}a = (\textbf{d}a)^Ta + a^T\textbf{d}a = 2a^T\textbf{d}a$
Next, I try to find $\textbf{d}a$
$\textbf{d}a = \textbf{d}(I-Ur) =\textbf{d}(I) - \textbf{d}(Ur) = 0 - (\textbf{d}U)r - U\textbf{d}r$
Therefore, $\textbf{d}J = -2(I-Ur)^T((\textbf{d}U)r - U\textbf{d}r)$
Since I am interested in finding $\partial J/ \partial U$, I will treat $r$ as a constant, such that $\textbf{d}r=0$
So now I have:
$\textbf{d}J = -2(I-Ur)^T(\textbf{d}U)r$
I can't figure out how to bring the $\textbf{d}U$ term to the far right in order to turn this expression into the derivative that I want.
For example, if it were the case that I ended up with this expression instead: $\textbf{d}J = -2(I-Ur)^Tr(\textbf{d}U)$
then I would have
$\partial J/ \partial U^T = -2(I-Ur)^Tr$ , and therefore:
$\partial J/ \partial U = -2\left((I-Ur)^Tr\right)^T = -2r^T(I-Ur)$
And I would be done! How do I turn the expression that I ended up with into the kind of expression that will give me the derivative I want?
Use the cyclic property of the trace and the fact that the trace of a scalar (or is it a $1\times1$ matrix?) equals the scalar value. $$\eqalign{ dJ &= -2\,a^TdU\,r \\ &= \operatorname{tr}(-2a^TdU\,r) \\ &= \operatorname{tr}(-2ra^TdU) \\ }$$ The gradient is then defined to be $$\frac{\partial J}{\partial U} = -2ra^T$$ $\ldots$or the transpose of this, depending on your preferred layout convention.