Proof of "The sum of squared distances from the points to the line is a minimum"

88 Views Asked by At

I'm reading "Introduction to linear algebra" of Gillbert Strang. PCA by SVD section. Text says that the sum of squared distances from the points to the line is a minimum and author is trying to proof that: $$\sum_{k=1}^{n} ||a_{k}||^2=\sum_{k=1}^{n}|a_{k}^Tu_{1}|^2 + \sum_{k=1}^{n}|a_{k}^Tu_{2}|^2$$ I don't really understand why length squared of columns of A is a sum of squares of inner products of its columns with singular eigenvectors.
So my first question could you please explain this part for me?
Next sentence is "The first sum on the right is $$u_{1}^TAA^Tu_{1}$$"
So how did $$\sum_{k=1}^{n}|a_{k}^Tu_{1}|^2$$ transform to $$u_{1}^TAA^Tu_{1}$$ ?
As I understand $$|a_{k}^Tu_1|^2$$ is just a sum of squares of the numbers of multiplications(inner products) of all columns with first singular eigenvector, right?

1

There are 1 best solutions below

0
On

This doesn't directly answer the question, but it is a method I came up with about 30 years ago for finding the line which minimizes the sum of the squares of the distances to a set of 2d points.

This was reformatted from LaTeX to MathJax, so there are some incomplete conversions.

I may have submitted this before - I often forget things like that.


The problem investigated here is that of fitting a straight line to a set of data points. The fit is intended to be independent of the coordinate system in the sense that, if the data points are rotated or shifted, the resulting line will be the same relative to the points.

The standard least squares fitted line does not satisfy this criteria, because the error in the fit for each point is considered to be the distance from the point to the line parallel to one of the ordinate axes. When the axes are rotated, the distance changes.

We choose to use the actual distance from the point to the line as the error in the fit. This yields an error which is invariant under translations and rotations.

No claim is made about the originality of these results. This is an independent derivation (made in 1985) of an often discovered technique.

\section{The polar form of the equation of a straight line}

There are a number of forms that the equation of a straight line can take (e.g., point-slope, intersection with axes, $y~=~mx+b$).

The form most useful for our purpose is the $polar$ form. In this form, a line $L$ is determined by its distance r from the origin and the angle $\theta$ that the normal from the origin to the line makes with the $x$ axis, the angle being measured counterclockwise.

The equation of $L$ in terms of r and $\theta$ is
$\eqalignno{L:~~ x\cos \theta + y \sin \theta ~=~r.&(1)}$

This can be verified by noting that $x~=~r/\cos\theta$ at $y~=~0$ and $y~=~r/\sin\theta$ at $x~=~0$.

Another derivation of equation (1) for $L$ is as follows: Let $(x,y)$ be a point on $L$, $d$ the distance of $(x,y)$ from the origin, and $\phi$ the angle that the line from the origin to $(x, y)$ makes with the $x$-axis. The angle between this line and the normal to $L$ is easily seen to be $\theta-\phi$. Thus, $r/d ~=~\cos(\theta-\phi)$.
Since $x~=~d\cos \phi$ and $y~=~d\sin \phi$, $$\eqalignno{ x \cos \theta + y \sin \theta &~=~ d \cos (\theta-\phi)\cr &~=~ r.\cr }$$ This derivation, though more complicated than the preceding one, has the advantage of also giving explicit formulae for $x$ and $y$ in terms of $r$, $\theta$, and $\phi$, $$x~=~{r\cos\phi \over \cos(\theta-\phi)} \quad{\rm and}\quad y~=~{r\sin\phi \over \cos(\theta-\phi)}.$$

The polar form for the equation of $L$ is so useful because the distance from any point $(u, v)$ to $L$ is $u\cos \theta + v \sin \theta - r$. To show this, consider the line $L'$ through $(u, v)$ that is parallel to $L$, is $u\cos\theta+v\sin\theta-r$} If $d$ is the distance from $L'$ to $L$, which is also the distance from $(u, v)$ to $L$, the equation of $L'$ is $x\cos\theta+y\sin\theta~=~d+r$, since $L$ and $L'$ are parallel.
Since $(u, v)$ is on $L'$, $u\cos\theta+v\sin\theta~=~d+r$, so that, as claimed, $d~=~u\cos\theta+y\sin\theta-r$.

\section{Evaluating the mean squared error}

We now define some notation and abbreviations. Let $c~=~\cos \theta$ and $s~=~\sin \theta$, so the equation of $L$ is $cx+sy~=~r$. The data points used to fit $L$ are $(x_i, y_i)$ for $i=1$ to $n$ (i.e., there are $n$ points).
For any expression f, we define $\overline{f} $ to be the average of f over all the data points, so that $$\overline{f} ~=~(1/n)\sum_{i=1}^n f_i.$$

For example, $\overline{x} ~=~(1/n)\sum_{i=1}^n x_i$, and $\overline{xy} ~=~(1/n)\sum_{i=1}^n x_i y_i$.

Note: We can also define $\overline{f} $ to be a weighted mean $$\overline{f} ~=~{\sum_{i=1}^n f_i w_i \over \sum_{i=1}^n w_i}$$ where each $w_i > 0$, and the results which follow are not affected at all.

If $p$ and $q$ are any of $x$ and $y$, we define $$\langle p,q\rangle ~=~\overline{pq} -\overline{p}\ \overline{q},$$ the covariance between the variables $p$ and $q$. For example, $$\langle x,x\rangle ~=~\overline{x^2}-\overline{x}^2 \quad{\rm and}\quad \langle x,y\rangle ~=~\overline{xy}-\overline{x}\ \overline{y}.$$

If D is the mean squared error of $L$, then
$$\eqalignno{ D &~=~ \overline{({\rm distance\;from\;point\;}i{\rm\;to\;}L)^2} \cr &~=~ \overline{d^2}\cr &~=~ \overline{(cx+sy-r)^2} \cr &~=~ \overline{c^2x^2+s^2y^2+r^2 + 2csxy-2crx-2sry} \cr }$$ so that $$\eqalignno{D~=~ c^2\overline{x^2} +s^2\overline{y^2} +r^2+ 2sc\overline{xy} -2cr\overline{x} -2sr\overline{y} . &(2)}$$

\section{Minimizing the mean squared error}

If $L$ is to be the best fitting line in the least mean squared sense, we must have
$${\partial D \over \partial r}~=~0 {\rm \quad and \quad} {\partial D \over \partial \theta}~=~0.$$

Since ${\partial D \over \partial r} =2r-2c\overline{x}-2s\overline{y} $, this implies that $r =c\overline{x}+s\overline{y} $. Since the equation of the line is $cx+sy = r$, this implies that the line passes through $(\overline{x}, \overline{y}) $.

$\begin{array}\\ 0 &={\partial D \over \partial \theta} \\ &={\partial \over \partial \theta}(c^2\overline{x^2} +s^2\overline{y^2} +r^2+ 2sc\overline{xy} -2cr\overline{x} -2sr\overline{y})\\ &=-2cs\overline{x^2} +2cs\overline{y^2} + 2(c^2-s^2)\overline{xy} +2sr\overline{x} -2cr\overline{y}\\ &=2cs(\overline{y^2} -\overline{x^2}) + 2(c^2-s^2)\overline{xy} +2r(s\overline{x} -c\overline{y})\\ &=2cs(\overline{y^2} -\overline{x^2}) + 2(c^2-s^2)\overline{xy} +2(c\overline{x}+s\overline{y})(s\overline{x} -c\overline{y})\\ &=2cs(\overline{y^2} -\overline{x^2}) + 2(c^2-s^2)\overline{xy} +2(cs\overline{x}^2-(s^2-c^2)\overline{x}\ \overline{y}-cs\overline{y}^2)\\ &=2cs(\overline{y^2}-\overline{y}^2 -\overline{x^2} +\overline{x}^2) +2(c^2-s^2)(\overline{xy}-\overline{x}\ \overline{y})\\ &=2cs(\langle y, y\rangle-\langle x, x\rangle) +2(c^2-s^2)\langle x, y\rangle\\ &=\sin(2\theta)(\langle y, y\rangle-\langle x, x\rangle) +2\cos(2\theta)\langle x, y\rangle\\ \\ \end{array} $

From this, $\theta$ can be determined.

However, the values of $r$ and $\theta$ that minimize $D$ can also be found without using any calculus. This will now be done by writing $D$ as the sum of terms which, when independently minimized, give the desired values for $r$ and $\theta$. $$\eqalignno{ D&~=~ c^2\overline{x^2} +s^2\overline{y^2} +r^2+ 2sc\overline{xy} -2cr\overline{x} -2sr\overline{y}\cr &=~ c^2\overline{x^2} +s^2\overline{y^2} +2sc\overline{xy} +r^2 -2r(c\overline{x} +s\overline{y})\cr &=~ c^2\overline{x^2} +s^2\overline{y^2} +2sc\overline{xy} +(r-c\overline{x} -s\overline{y})^2 -(c\overline{x} +s\overline{y})^2\cr &=~c^2(\overline{x^2}-\overline{x}^2) + s^2(\overline{y^2}-\overline{y}^2) + 2sc(\overline{xy}-\overline{x}\overline{y}) +(r-c\overline{x} -s\overline{y})^2\cr &=~c^2\langle x,x\rangle + s^2\langle y,y\rangle +2sc\langle x,y\rangle +(r-c\overline{x} -s\overline{y})^2.&(3)\cr }$$ Letting $S=\sin 2\theta =2sc$ and $C=\cos 2\theta=c^2-s^2$, since $c^2=(1+C)/2$ and $s^2=(1-C)/2$,

$$\eqalignno{ D~&=~\frac{1+C}{2}\langle x,x\rangle + \frac{1-C}{2}\langle y,y\rangle +S\langle x,y\rangle +(r-c\overline{x} -s\overline{y})^2\cr ~&=~{\langle x,x\rangle +\langle y,y\rangle \over 2} + C {\langle x,x\rangle-\langle x,y \rangle \over 2} + S \langle x,y\rangle +(r-c\overline{x} -s\overline{y})^2\cr &=~D_1+C~D_2+S~D_3 +(r-c\overline{x} -s\overline{y})^2&(4)\cr } $$

where $$\eqalignno{ D_1&~=~{\langle x,x\rangle +\langle y,y\rangle \over 2},&(5a)\cr D_2&~=~{\langle x,x\rangle -\langle y,y\rangle \over 2},&(5b)\cr D_3&~=~\langle x,y\rangle .&(5c)\cr }$$ Defining $R$ and $\phi$ by $$\eqalignno{D_2~=~R\cos\phi\quad {\rm and}\quad D_3~=~R\sin\phi, &(6)}$$ where $R \ge 0$ and $0 \le \phi < 2\pi$, $$\eqalignno{ D~&=~D_1+C~D_2+S~D_3 +(r-c\overline{x} -s\overline{y})^2\cr &=~D_1+R\cos 2\theta\cos\phi+R\sin 2\theta\sin\phi +(r-c\overline{x} -s\overline{y})^2\cr &=~D_1+R\cos(2\theta-\phi) +(r-c\overline{x} -s\overline{y})^2.&(7)\cr }$$ This is the desired expression for $D$.

Since $\cos(2\theta-\phi) \ge -1$ and $(r-c\overline{x}-s\overline{y})^2 \ge 0$, $D \ge D_1-R$. By choosing $$\eqalignno{ \theta~&=~{\phi+\pi \over 2} \quad{\rm and}\quad r~=~\overline{x}\cos\theta + \overline{y}\sin\theta, &(8)}$$ $D$ will achieve its minimum value $$\eqalignno{ D~&=~D_1 - R. & (9)}$$

\section{Summary of line fitting algorithm}

The technique for fitting a line to a set of data points is:

1. Gather the needed mean values: $\overline{x}$, $\overline{y}$, $\overline{xy}$, $\overline{x^2}$, and $\overline{y^2}$.

2. Using $\langle p,q\rangle ~=~\overline{pq}-\overline{p}\ \overline{q}$, compute the covariances $\langle x,x\rangle $, $\langle y,y\rangle $, and $\langle x,y\rangle $ and set $$\eqalignno{ D_1&~=~{\langle x,x\rangle +\langle y,y\rangle \over 2},\cr D_2&~=~{\langle x,x\rangle -\langle y,y\rangle \over 2},\cr D_3&~=~\langle x,y\rangle .\cr }$$

3. Get the mean squared error $D$ by $$R~=~\sqrt{D_2^2 + D_3^2}$$ and $$D~=~D_1 - R.$$

4. To find $r$ and $\theta$, the parameters of $L$ when in polar form, set $\phi$ to the value of $\tan^{-1}(D_3 / D_2)$ such that $D_2~=~R\cos\phi$ and $D_3~=~R\sin\phi$. (This can be done in Fortran using the ATAN2 function.) Then $$\theta~=~{\phi+\pi \over 2} \quad{\rm and}\quad r~=~\overline{x}\cos\theta + \overline{y}\sin\theta.$$