How to reach Moore-Penrose pseudoinverse solution to minimize error function

478 Views Asked by At

Edit

I'm trying to figure the derivation of the Moore-Penrose pseudoinverse for linear regression. The starting expression is the standard error function. I'm not quite sure how to expand on this since all the terms are vectors (my linear algebra is a bit rusty):

$\frac12 (\sum_{n=1}^N(w^Tx^{(n)} - t^{(n)})^2)$

where x is data vector, w is weight vector and t is target vector. So this is the error function taken over the N sample datapoints. Each datapoint consists of a vector of parameters, which is the vector $x$

I need to take the gradient of this w.r.t to $w$ to get:

$X^TXw - X^Tt$

I should clarify that $X$ here is the matrix of all $x^{(n)}$. So its an NxM matrix where N is the total datapoints and M is the number of features in each datapoint.

The Moore-Penrose pseudoinverse could then be used to minimize this function by setting the gradient at 0:

$X^TXw - X^Tt = 0$

$w = (X^TX)^{-1}X^Tt$

Can someone please explain how to do this?