Prove $(X \theta - \vec{y})^T (X \theta - \vec{y}) = \theta^T X^T X \theta - \theta^T X^T \vec{y} - \vec{y}^T X \theta + \vec{y}^T \vec{y}$

336 Views Asked by At

I'm studying Machine Learning Stanford's CS229 course and in the lecture note, page number 11, I'm not getting how does step 2 arrive from step 1 above?
Prof. Andrew Ng says that it is the expansion of quadratic $(X \theta - \vec{y})^T (X \theta - \vec{y})$ which is taken from derivation on page number 10.
Can anyone explain me how does the expansion of quadratic $(X \theta - \vec{y})^T (X \theta - \vec{y})$ is equal to $\theta^T X^T X \theta - \theta^T X^T \vec{y} - \vec{y}^T X \theta + \vec{y}^T \vec{y}$?

2

There are 2 best solutions below

0
On BEST ANSWER

The expression follows from the Distributive Law and the Transpose Rules of matrix algebra.

These are -

  1. $(A+B)(C+D)= AC+AD+BC+BD$

  2. $(AB)^T = B^T . A^T$ and $(A+B)^T = A^T + B^T$

The first term expands as -

$(X\theta -y)^T ={ \theta }^T X^T - y^T$

The rest is simple multiplication.


The proofs of the properties used -

The Distributive Law is an axiom.

For $(A+B)^T = A^T +B^T$ , consider the $(i,j)^{\text{th}}$ elements -

$ (A+B)^T = (a_{ij} +b_{ij})^T = (a_{ji}+b_{ji}) = (a_{ji}) +(b_{ji})= A^T+B^T$

For the product rule-

By definition-

$AB = ( \Sigma_{k=1}^{n} (a_{ik} b_{kj}) ) $ $(i,j)^{\text{th}}$ element

Now,

$(AB)^T = ( \Sigma_{k=1}^{n} (a_{ik} b_{kj}) ) $ $(j,i)^{\text{th}}$ element

$= ( \Sigma_{k=1}^{n} (a_{ki} b_{jk}) ) $ $(i,j)^{\text{th}}$ element

$= ( \Sigma_{k=1}^{n} (b_{jk} a_{ki}) ) $ $(i,j)^{\text{th}}$ element

$=B^T A^T$

The above uses the fact that-

If $A=(a_{ij})$, then $A^T = (a_{ji})$.

0
On

Since $(X\theta-\vec{y})^T=\theta^TX^T-\vec{y}^T$, the product is $$(\theta^TX^T-\vec{y}^T)(X\theta-\vec{y})=\theta^T X^T X \theta - \theta^T X^T \vec{y} - \vec{y}^T X \theta + \vec{y}^T \vec{y}.$$