Matrix derivative of $\operatorname{tr}\theta^TX^TX{\theta}$

147 Views Asked by At

This is a part of the derivation of the normal equation and I am struggling with this part.

I don't get how $\operatorname{tr}\theta^TX^TX{\theta}$ can become $2X^TX\theta$....

I know that the derivative of $\operatorname{tr}(ABA^TC)$ respect to $A$ is equal to $CAB + C^TAB^T$ and the lecturer seems that he wants me to use this to derive it, but I don't get how I should use it.

The picture is the part of the lecture note that I'm struggling with.

The part I am struggling with

2

There are 2 best solutions below

0
On BEST ANSWER

\begin{align} & \frac d {d\theta} \operatorname{tr}(\theta^TX^TX\theta) \\[12pt] = {} & \frac d {dA}\operatorname{tr}(ABA^TC) \\[4pt] & \text{with $\theta^T$ in the role of } A, \\ & \text{$X^TX$ in the role of } B, \\ & \text{and } I \text{ in the role of } C \\[12pt] = {} & CAB + C^TAB^T \quad \text{(This was given.)} \\[10pt] = {} & \theta^T X^TX + \theta^T X^TX.{} \end{align} Here, $B$ and $C$ must be square matrices, and their sizes differ if $A$ is not a square matrix.

0
On

Let's rewrite it using the trace/Frobenius product notation (colon), i.e.

\begin{align} F & = \theta^TX^TX\theta\\ \implies Tr(F)&= Tr((X\theta)^TX\theta) = X \theta:X\theta\\ \implies dF & = d(X \theta):X\theta + X\theta:d(X \theta) \\ & = 2X\theta:d(X \theta)\\ & = 2X\theta: (d(X) \theta + X d\theta)\\ & = 2X\theta: X d\theta\\ & = 2X^TX\theta: d\theta\\ \implies \frac{dF}{d\theta} &=2X^TX\theta \end{align}

=========================================

NB: I used the following properties of the trace function: $$Tr(A^TB) = A:B$$ $$Tr(AB) = Tr(BA)$$ $$A:BC=AC^T:B=B^TA:C$$