I'm studying Machine Learning Stanford's CS229 course and in the lecture note, page number 11, I'm not getting how does step 2 arrive from step 1 above?
Prof. Andrew Ng says that it is the expansion of quadratic $(X \theta - \vec{y})^T (X \theta - \vec{y})$ which is taken from derivation on page number 10.
Can anyone explain me how does the expansion of quadratic $(X \theta - \vec{y})^T (X \theta - \vec{y})$ is equal to $\theta^T X^T X \theta - \theta^T X^T \vec{y} - \vec{y}^T X \theta + \vec{y}^T \vec{y}$?
2026-03-27 15:20:03.1774624803
Prove $(X \theta - \vec{y})^T (X \theta - \vec{y}) = \theta^T X^T X \theta - \theta^T X^T \vec{y} - \vec{y}^T X \theta + \vec{y}^T \vec{y}$
336 Views Asked by user371331 https://math.techqa.club/user/user371331/detail At
2
The expression follows from the Distributive Law and the Transpose Rules of matrix algebra.
These are -
$(A+B)(C+D)= AC+AD+BC+BD$
$(AB)^T = B^T . A^T$ and $(A+B)^T = A^T + B^T$
The first term expands as -
$(X\theta -y)^T ={ \theta }^T X^T - y^T$
The rest is simple multiplication.
The proofs of the properties used -
The Distributive Law is an axiom.
For $(A+B)^T = A^T +B^T$ , consider the $(i,j)^{\text{th}}$ elements -
$ (A+B)^T = (a_{ij} +b_{ij})^T = (a_{ji}+b_{ji}) = (a_{ji}) +(b_{ji})= A^T+B^T$
For the product rule-
By definition-
$AB = ( \Sigma_{k=1}^{n} (a_{ik} b_{kj}) ) $ $(i,j)^{\text{th}}$ element
Now,
$(AB)^T = ( \Sigma_{k=1}^{n} (a_{ik} b_{kj}) ) $ $(j,i)^{\text{th}}$ element
$= ( \Sigma_{k=1}^{n} (a_{ki} b_{jk}) ) $ $(i,j)^{\text{th}}$ element
$= ( \Sigma_{k=1}^{n} (b_{jk} a_{ki}) ) $ $(i,j)^{\text{th}}$ element
$=B^T A^T$
The above uses the fact that-
If $A=(a_{ij})$, then $A^T = (a_{ji})$.