I'm trying to find the best fit line for (17,4), (-2,26), (11,7) using the concept of SVD(Singular Value Decomposition).
My approach is, setting $A = \begin{bmatrix} 17 & 4\\ -2 & 26\\ 11 & 7 \end{bmatrix}$ and finding eigenvalues and eigenvectors for $A^TA$. By checking determinant of $A^TA-\lambda I$, I got $\lambda= 244 \pm \sqrt{37549}$.
My issue is, this $\lambda$ doesn't feel right. Also, using $\lambda= 244 + \sqrt{37549}$, I got eigenvectors $\begin{bmatrix} 0\\0 \end{bmatrix}$. I think something is wrong here. Could you give me any advice?
Here, the data points don't lie around a line passing through origin. Had it been the case, the LS fit solution would be $Ax=0$, and would have been given by the first (right) singular vector (i.e, the dominant eigenvector of the matrix $A^TA$), which would be the solution to the optimization problem for maximizing the projection $\underset{|v|=1}{max}|Av|=0$.
Still the first right singular vector is the dominant eigenvector of $A^TA$: $[0.2557118, 0.9667531]$, as you can find with
Rwhich can be directly found with SVD:
You get $A^TA = \begin{bmatrix}414 & 93\\ 93 & 741 \end{bmatrix}$ and the characteristic polynomial $\lambda^2-1155\lambda+298125=0$, solving you get the largest eigenvalue $\lambda=765.5991$ and Solving the linear system $\begin{bmatrix}414 & 93\\ 93 & 741 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2\end{bmatrix}=765.5991\begin{bmatrix}x_1 \\ x_2\end{bmatrix}$, $\implies 93x_2=351.5991x_1$ and $93x_1=24.5991x_2 \implies \frac{x_1}{x_2}\approx 0.2645$, s.t. the corresponding eigenvector is $\begin{bmatrix}0.2645\\ 1\end{bmatrix}$, normalizing (i.e, dividing by $\sqrt{0.2645^2+1^2}=1.034389$), we get the dominant unit eigenvector as $\begin{bmatrix}\frac{0.2645}{1.034389}\\ \frac{1}{1.034389}\end{bmatrix}=\begin{bmatrix}0.25571\\ 0.96675\end{bmatrix}$, which agrees with computation using the numeric methods with
R.Let's for instance consider a different set of points $(\frac{7}{10},\frac{7}{10}), (7,9), (2,\frac{9}{2}$), s.t., the matrix $A=\begin{bmatrix}7/10 & 7/10\\ 7&9 \\ 2&9/2 \end{bmatrix}$. Now if we want the best fit line through origin (without having an intercept) then the corresponding vector will lie in the nullspace of $A$, i.e., will be a solution of $Ax=0$, the least square solution can be approximately found with SVD of $A$, as shown below:
The following figure shows the best-fit line obtained with SVD:
This agrees with the linear regression least-square fit without intercept:
Now, let's say we have the points $(17,4), (-2,26), (11,7)$ instead, then in order to find the best fit line, we need to minimize $||Ax-b||_2^2$, which boils down to solving $Ax=b$, where $b \neq 0$. Here, we have $A=\begin{bmatrix}1 & 17 \\ 1 & -2 \\ 1 & 11 \end{bmatrix}$ and $b=\begin{bmatrix} 4 \\ 26 \\ 7 \end{bmatrix}$ and the least square solution is given by the normal equation (the psuedo inverse) $\hat{\beta}=(A^TA)^{-1}A^Ty$. Now, we can use SVD here too, with SVD of $A=U\Sigma V^T$, we have $\hat{\beta}=V\Sigma^{-1}U^Ty$, as shown below.
It matches with the LS solution obtained using normal equation: