I am beginning to study normal equations and am looking at the cost functions.
I wish to get from this equation where w is the weight vector, x is the feature vector
$g(w) = \frac{1}{N} \sum_{i=1}^n(y_i-w^Tx_i)^2 $
To this
$g(w) = \frac{1}{N}(y-Xw)^T (y-X_w) $
I can see that multiplying out the brackets is the first step but can't figure out what next.
$ (y_i-w^Tx_i)(y_i-w^Tx_i)$
What rule am I missing to reach the second equation, and why is the $w$ term not transposed in the second bracket?
Let z be an arbitrary vector $z \in \mathbb{R}^n$. Notice that $\sum_{i=1}^n z_i^2=z^T z$.
Now what you are looking for follows from choosing $z= y-Xw$.