How I can reduce this matricial expression?

30 Views Asked by At

$\frac{1}{N}((w - (X^TX)^{-1}X^Ty)^T(X^TX)(w-(X^TX)^{-1}X^Ty) + y^T(I - X(X^TX)^{-1}X^T)y) $

I have to achieve this new expression:

$\frac{1}{N}(w^TX^TXw -2w^TX^Ty +y^Ty) $

Additional information:

  • $\ X^TX$ is invertible and positive definite
  • $w$ dimensions (d + 1) x 1
  • $y$ dimensions N x 1
  • $X$ dimensions N x (d + 1)
2

There are 2 best solutions below

4
On BEST ANSWER

You have $$ \eqalign{ & {\bf F} = {1 \over N}\; \cdot \cr & \cdot \;\left( {\left( {{\bf w} - \left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y}} \right)^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right)\left( {{\bf w} - \left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y}} \right) + {\bf y}^{\bf T} \left( {{\bf I} - {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} } \right){\bf y}} \right) \cr} $$

Let's take out the $N$ factor to simplify brackets, then $$ \eqalign{ & N\,{\bf F} = \left( {{\bf w} - \left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y}} \right)^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right)\left( {{\bf w} - \left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y}} \right) + {\bf y}^{\bf T} \left( {{\bf I} - {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} } \right){\bf y} = \cr & = \left( {{\bf w}^{\bf T} - {\bf y}^{\bf T} {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} } \right)\left( {{\bf X}^{\bf T} {\bf X}} \right)\left( {{\bf w} - \left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y}} \right) + {\bf y}^{\bf T} \left( {{\bf I} - {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} } \right){\bf y} = \cr & = \left( {{\bf w}^{\bf T} - {\bf y}^{\bf T} {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} } \right)\left( {\left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf X}^{\bf T} {\bf y}} \right) + {\bf y}^{\bf T} \left( {{\bf I} - {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} } \right){\bf y} = \cr & = \left( {{\bf w}^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y}} \right) - \left( {{\bf y}^{\bf T} {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf y}^{\bf T} {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y}} \right) + {\bf y}^{\bf T} \left( {{\bf I} - {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} } \right){\bf y} = \cr & = {\bf w}^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y} - {\bf y}^{\bf T} {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} + {\bf y}^{\bf T} {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y} + {\bf y}^{\bf T} {\bf y} - {\bf y}^{\bf T} {\bf X}\left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y} = \cr & = {\bf w}^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y} - {\bf y}^{\bf T} {\bf Xw} + {\bf y}^{\bf T} {\bf y} = \cr & = {\bf w}^{\bf T} {\bf X}^{\bf T} \left( {{\bf Xw} - {\bf y}} \right) - {\bf y}^{\bf T} \left( {{\bf Xw} - {\bf y}} \right) = \cr & = \left( {{\bf w}^{\bf T} {\bf X}^{\bf T} - {\bf y}^{\bf T} } \right)\left( {{\bf Xw} - {\bf y}} \right) = \cr & = \left( {{\bf Xw} - {\bf y}} \right)^{\bf T} \left( {{\bf Xw} - {\bf y}} \right) \cr} $$ which is a scalar.

Note that $$ {\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y} = {\rm scalar} = \left( {{\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y}} \right)^{\bf T} = {\bf y}^{\bf T} {\bf Xw} $$ so in the last but third step you can also write $$ \eqalign{ & N\,{\bf F} = {\bf w}^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y} - {\bf y}^{\bf T} {\bf Xw} + {\bf y}^{\bf T} {\bf y} = \cr & = {\bf w}^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - 2{\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y} + {\bf y}^{\bf T} {\bf y} \cr} $$

For optimizing $N\,{\bf F}$ wrt $\bf w$, considering that it is a scalar function of the vector $\bf w$, and thus a scalar function of the $d+1$ variables $w_1, \cdots, w_{d+1}$, and forgetting for the moment $N$, we can write our function as $$ \eqalign{ & g({\bf w}) = g(w_{\,1} ,w_{\,2} ,\, \cdots ,w_{\,d + 1} ) = \cr & = \left( {{\bf Xw} - {\bf y}} \right)^{\bf T} \left( {{\bf Xw} - {\bf y}} \right) = \cr & = {\bf w}^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - 2{\bf w}^{\bf T} {\bf X}^{\bf T} {\bf y} + {\bf y}^{\bf T} {\bf y} \cr} $$

Local extrema will be given by the values of $\bf w$ for which all the partial derivatives of $g$ will be null.
And $g$ is a quadratic form.
Taking for simplicity the expanded form for $g$, so we shall have $$ 0 = {\partial \over {\partial \,w_{\,k} }}g({\bf w}) = \left( {{{\partial \,{\bf w}} \over {\partial \,w_{\,k} }}} \right)^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} + {\bf w}^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right)\left( {{{\partial \,{\bf w}} \over {\partial \,w_{\,k} }}} \right) - 2\left( {{{\partial \,{\bf w}} \over {\partial \,w_{\,k} }}} \right)^{\bf T} {\bf X}^{\bf T} {\bf y} $$ and since the first two terms are scalar $$ \eqalign{ & 0 = \left( {{{\partial \,{\bf w}} \over {\partial \,w_{\,k} }}} \right)^{\bf T} \left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - \left( {{{\partial \,{\bf w}} \over {\partial \,w_{\,k} }}} \right)^{\bf T} {\bf X}^{\bf T} {\bf y} = \cr & = \left( {{{\partial \,{\bf w}} \over {\partial \,w_{\,k} }}} \right)^{\bf T} \left( {\left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf X}^{\bf T} {\bf y}} \right)\quad \left| {\;1 \le k \le d + 1} \right. \cr} $$ Note that the term in brackets is a vertical vector of dimension $d+1$

Now we have $$ {\partial \over {\partial \,w_{\,1} }}{\bf w} = {\partial \over {\partial \,w_{\,1} }}\left( {\matrix{ {w_{\,1} } \cr {w_{\,2} } \cr \vdots \cr {w_{\,d + 1} } \cr } } \right) = \left( {\matrix{ 1 \cr 0 \cr \vdots \cr 0 \cr } } \right) $$ and similarly for the other components.

Therefore, indicating with $\bf 0$ the null vector $(d+1) \times 1$ and with $\bf I$ the identity matrix $(d+1) \times (d+1)$, we can write $$ {\bf 0} = {\bf I}\,\left( {\left( {{\bf X}^{\bf T} {\bf X}} \right){\bf w} - {\bf X}^{\bf T} {\bf y}} \right)\quad \Rightarrow \quad {\bf w} = \left( {{\bf X}^{\bf T} {\bf X}} \right)^{\, - \,{\bf 1}} {\bf X}^{\bf T} {\bf y} $$

0
On

Take

$\frac{1}{N}((w-(X'X)^{-1}X'y)'(X'X)(w-(X'X)^{-1}X'y)$

Expanding all terms, and using that $(w-(X'X)^{-1}X'y)'=(w'-y'X(X'X)^{-1})$, (Note that $(X'X)'=X'X$) this is equal to

$\frac{1}{N}(w'(X'X)w-w'(X'X)(X'X)^{-1}X'y-y'X(X'X)^{-1}(X'X)w+y'X(X'X)^{-1}(X'X)(X'X)^{-1}X'y)$

Then $\frac{1}{N}(w'(X'X)w-w'X'y-y'Xw+y'X(X'X)^{-1}X'y)$

Note that $w'X'y=(w'X'y)'=y'Xw$ because it is a scalar. Hence, $\frac{1}{N}(w'(X'X)w-2w'X'y+y'X(X'X)^{-1}X'y)$

Now, regarding the other part
$\frac{1}{N}(y'(I-X(X'X)^{-1}X')y)=\frac{1}{N}(y'y-y'X(X'X)^{-1}X'y)$

Finally, sum both terms.