Derivative of matrix multiplication's norm for linear regression

99 Views Asked by At

I am trying to solve the derivative for the following function $f(θ)=0.5∥Xθ−y∥_2^2$ where X is a big(1000x2) matrix, θ is a 2x1 vector and y is a 1000x1 vector. I have so far realised that f can be reduced to 0.5(Xθ−y)(Xθ−y) and I tried applying both the chain rule and the product rule to come to the same result both times. The result being $f'(θ)= (Xθ−y)*\dfrac{df}{dθ}(Xθ−y)$

I've tried to do a lot of research on how to continue past this point and frankly from what I've seen I might have done everything wrong from the start. Could anyone give me any pointers?

1

There are 1 best solutions below

0
On

This problem is equivalent to finding the least-squares solution of the linear system $$X\theta = y$$ and has a closed-form solution in terms of the pseudoinverse of $X$ $$\eqalign{ \theta = X^+y \;=\; \left(X^TX\right)^{-1}X^T\,y \\ }$$