It is easy to show that the solution to a least square problem is
$$\vec{w} = (X^TX)^{-1}X^T \vec{y}$$
In my case, the entries of matrix $X$ are filled from left to write with an added bias, meaning
\begin{bmatrix}x_{1,1}& \dots&x_{1,n}&1\\\vdots & \ddots & \vdots & \vdots\\x_{m,1}&\dots&x_{m,n}&1\end{bmatrix}
I would now like to take the gradient of the norm of $\vec{w}$ with respect to all $x_{i,j}$ going from $x_{1,1},...,x_{1,n},...x_{m,n}$. So
$$\nabla_x |\vec{w}| = \nabla_x |(X^TX)^{-1}X^T \vec{y}|$$
I have difficulties calculating this derivative. Was this done before or does anybody have a few tips how to calculate this?
Thanks in advance.
You did not specify the norm, so I give an answer for the arbitrary scalar function $f:\mathbb{C}^{n}\rightarrow \mathbb{R}$:
$$ \frac{\partial f\left(\left(X^TX\right)^{-1}X^{T}y\right)}{\partial x_{ij}}= f^{'}\left(\left(X^TX\right)^{-1}X^{T}y\right)^T\frac{\partial }{\partial x_{ij}} \left(X^TX\right)^{-1}X^{T}y.$$
We are left with two terms
$$\left(\frac{\partial}{\partial x_{ij}} \left(X^TX\right)^{-1}\right)X^{T}y + \left(X^TX\right)^{-1}\left(\frac{\partial}{\partial x_{ij}} X^{T}\right)y.$$
The second one is trivial. The first one is somewhat harder. To find it, you can use that for arbitrary (invertible) matrix $A$ the following is true
$$ \frac{\partial}{\partial a_{ij}} \left(A^{-1} A\right) = 0, $$
hence
$$ \left[\frac{\partial}{\partial a_{ij}} \left(A^{-1}\right)\right]_{kl} = - \left[A^{-1}\right]_{ki}\left[A^{-1}\right]_{jl}. $$