proof of normal equations for ordinary least squares using matrix trace

803 Views Asked by At

I follow the proof on Wiki. However, I first encountered this while looking at these notes from Stanford CS229. Section 2.2 contains a proof that uses matrix traces, including this part from page 11:

proof

I'm not clear on the fourth step. I understand that tr(A+B) = trA + trB, so the first term in the parentheses is clear. However, I don't follow how the trace of the other three terms are combined into:

enter image description here

1

There are 1 best solutions below

1
On BEST ANSWER

The last term in the parentheses $y^Ty$ does not involve $\theta$ so its derivative (with respect to $\theta$) is zero. This is why it disappears on the fourth line.

The two terms before it have the same trace because for a matrix $A$, we have $\text{tr} A^T=\text{tr} A$.