My professor (during deriving the closed form solution for Linear Regression) said
$\frac{\partial \theta^\intercal X^\intercal X\theta}{\partial \theta} = 2X^\intercal X \theta$
What are the rules for deciding to write $\theta^\intercal X^\intercal X\theta$
and not $\theta^2 X^\intercal X$
or to write $2 X^\intercal X \theta$
and not $2 \theta X^\intercal X$
If the reason is to make sure the dimensions match for matrix multiplication - then is it possible to reorder any terms in a multiplication to get a result in the dimensions you want or are there rules?