I follow the proof on Wiki. However, I first encountered this while looking at these notes from Stanford CS229. Section 2.2 contains a proof that uses matrix traces, including this part from page 11:
I'm not clear on the fourth step. I understand that tr(A+B) = trA + trB, so the first term in the parentheses is clear. However, I don't follow how the trace of the other three terms are combined into:


The last term in the parentheses $y^Ty$ does not involve $\theta$ so its derivative (with respect to $\theta$) is zero. This is why it disappears on the fourth line.
The two terms before it have the same trace because for a matrix $A$, we have $\text{tr} A^T=\text{tr} A$.