I am trying to find a linear regression for the problem:
$$\displaystyle\arg\min_w\|y-Xw\|^2 $$
By finding the optimum of the above equation, I get
$$\displaystyle X^TXw=X^Ty $$
In the case where $X^TX$ is invertible (i.e. the variables are independent), I can get the unique solution
$$\displaystyle w=(X^TX)^{-1}X^T $$
However, when the variables are dependent, there's more than one unique solution.
Now, say I want to find a solution with minimal $l_2$ norm. I can define the new problem as:
$$\displaystyle\begin{align}\arg&\min_w\|w\| \\ &s.t. X^TXw=X^Ty \end{align}$$
How can I now use SVD decomposition ($X=U\Sigma V^T$) to solve the above optimization problem?
Solving with lagrange method:
I tried optimizing the equivalent $0.5\|w\|^2$, and got the following Lagrangian:
$$ \mathcal{L}(w,\alpha)=0.5\|w\|^2+\alpha(X^Ty-X^TXw) $$
When the gradient w.r.t $w$ is equal to 0, I get:
$$w = \alpha X^TX\\ X^Ty=X^TXw $$
But couldn't proceed from here
Let the SVD of $\mathrm X \in \mathbb R^{n \times p}$ be
$$\mathrm X = \mathrm U \Sigma \mathrm V^{\top} = \begin{bmatrix} \mathrm U_1 & \mathrm U_2\end{bmatrix} \begin{bmatrix} \Sigma_1 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm V_1^{\top}\\ \mathrm V_2^{\top}\end{bmatrix}$$
The eigendecomposition of $\mathrm X^{\top} \mathrm X$ is, thus,
$$\mathrm X^{\top} \mathrm X = \mathrm V \Sigma^{\top} \mathrm U^{\top} \mathrm U \Sigma \mathrm V^{\top} = \mathrm V \Sigma^{\top} \Sigma \mathrm V^{\top} = \begin{bmatrix} \mathrm V_1 & \mathrm V_2\end{bmatrix} \begin{bmatrix} \Sigma_1^2 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm V_1^{\top}\\ \mathrm V_2^{\top}\end{bmatrix}$$
Hence, the normal equations
$$\boxed{\mathrm X^{\top} \mathrm X \, \mathrm w = \mathrm X^{\top} \mathrm y}$$
can be written as follows
$$\mathrm V \begin{bmatrix} \Sigma_1^2 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \mathrm V^{\top} \mathrm w = \mathrm V \begin{bmatrix} \Sigma_1 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \mathrm U^{\top} \mathrm y$$
Let $\mathrm z := \mathrm V^{\top} \mathrm w$. Left-multiplying by $\mathrm V^{\top}$,
$$\begin{bmatrix} \Sigma_1^2 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm z_1\\ \mathrm z_2\end{bmatrix} = \begin{bmatrix} \Sigma_1 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm U_1^{\top} \mathrm y\\ \mathrm U_2^{\top} \mathrm y\end{bmatrix}$$
Let $r := \mbox{rank} (\mathrm X)$. Thus,
$$\begin{bmatrix} \mathrm I_r & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm z_1\\ \mathrm z_2\end{bmatrix} = \begin{bmatrix} \Sigma_1^{-1} & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm U_1^{\top} \mathrm y\\ \mathrm U_2^{\top} \mathrm y\end{bmatrix} = \begin{bmatrix} \Sigma_1^{-1}\mathrm U_1^{\top} \mathrm y\\ \mathrm 0_{p-r}\end{bmatrix}$$
which always has a solution. Note that $\mathrm z_2$ is free. Since $\mathrm w = \mathrm V \mathrm z = \mathrm V_1 \mathrm z_1 + \mathrm V_2 \mathrm z_2$, the solution set of the normal equations is the $(p-r)$-dimensional affine space parameterized as follows
$$\left\{ \mathrm V_1 \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y + \mathrm V_2 \eta \mid \eta \in \mathbb R^{p - r} \right\}$$
Note that
$$\| \mathrm V_1 \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y + \mathrm V_2 \eta \|_2^2 = \Bigg\| \mathrm V \begin{bmatrix} \Sigma_1^{-1}\mathrm U_1^{\top} \mathrm y\\ \eta\end{bmatrix} \Bigg\|_2^2 = \Bigg\| \begin{bmatrix} \Sigma_1^{-1}\mathrm U_1^{\top} \mathrm y\\ \eta\end{bmatrix} \Bigg\|_2^2 = \| \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y \|_2^2 + \| \eta \|_2^2$$
which is minimized when $\eta = 0_{p-r}$. Thus, the least-norm solution is simply
$$\boxed{\mathrm w^* := \mathrm V_1 \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y}$$