I have an under-determined system of linear equations, which is not full rank (i.e. one or more column vectors are linearly dependent) and has more equations than unknowns.
Here is just one example of this:
$$ {\begin{bmatrix} 6 & 40 \\ 6 & 40 \\ 3 & 20 \\ \end{bmatrix}} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} \approx \begin{bmatrix} 0.5 \\ 0.2 \\ 0.6 \\ \end{bmatrix} $$
If I solve the above example for the least squares solution (i.e. solve $A\vec{x}\approx\vec{b}$), the solution produced will come from a set of solutions $S$ that is infinite in length (i.e. $|S| = \infty$). However, is it possible to minimize the residuals (i.e. get the best model that minimizes the error) from this infinite set of solutions $S$?
This question is sparked from a response I got on StackOverflow (SO) and from what I read on this forum. For example, on my SO post, user @AGN Gazer says the following:
"A solution with a minimal norm does not mean a solution that minimizes residuals. [...] Having a solution with a minimal norm means nothing to me[.] [...] I want the "best" solution - the one that minimizes the residuals but I cannot get it with an underdetermined system."
However, on the accepted answer on this Math Exchange post, it appears that @Brian Borchers says otherwise:
"[We're] often interested in the minimum norm least squares solution. That is, among the infinitely many least squares solutions, pick out the least squares solution with the smallest $∥x∥_{2}$ [(i.e. euclidean norm: $\| x \|_{2} =\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$)]. The minimum norm least squares solution is always unique."
I must be misunderstanding something! In relation to my question above, what even is the distinction between minimizing the residuals and minimizing the norm?
In your example, there is no exact solution of $Ax = b$. Because $A$ is not full rank, the least squares solution is not unique.
The minimum 2-norm least squares solution is that least squares solution for which $\|x\|_2$ is minimum among those least squares solutions. I.e., consider all least squares solutions achieving the same sum minimum squared residuals value, $(Ax-b)^T(Ax-b)$, then minimizing 2-norm of x among those solutions can serve as a tie-breaker to choose from among the solutions achieving the minimum sun squared residuals value. The minimum 2-norm solution can be found using the pseudoinvserse (pinv in MATLAB) of $A$, as shown below. Note that if $A$ is full rank, then the least squares solution is unique, and therefore, is the minimum 2-norm least squares solution.
Here is an illustration in MATLAB on your example. As can be seen, in this example, the QR and SVD (minimum 2-norm) solutions have the same residuals, but $\|x||_2$ is smaller for the SVD solution than the QR solution.