In Andrew Ng's notes in Machine Learning, I found this equality that I don't understand why it's true. $$∇_θJ(θ) = (1/2) ∇_θ (θ^TX^T Xθ − θ^T X^T y − y^TXθ + y^T y)$$ $$ = (1/2) ∇_θ tr(θ^TX^T Xθ − θ^T X^T y − y^TXθ + y^T y)$$ $X, θ,$ and $y$ have the following dimensions.$$X \in R^{m \times n}, θ \in R^{n}, y \in R^{m}$$
He doesn't mention why this steps is true, so I wonder why this is true. Can anyone help? Thanks!
(The full development of the equation is in page 11 on http://cs229.stanford.edu/notes/cs229-notes1.pdf)
The quantity inside the trace is a real number / scalar, so its trace is equal to itself.