I'm reading "Introduction to linear algebra" of Gillbert Strang.
PCA by SVD section.
Text says that the sum of squared distances from the points to the line is a minimum and author is trying to proof that:
$$\sum_{k=1}^{n} ||a_{k}||^2=\sum_{k=1}^{n}|a_{k}^Tu_{1}|^2 + \sum_{k=1}^{n}|a_{k}^Tu_{2}|^2$$
I don't really understand why length squared of columns of A is a sum of squares of inner products of its columns with singular eigenvectors.
So my first question could you please explain this part for me?
Next sentence is "The first sum on the right
is $$u_{1}^TAA^Tu_{1}$$"
So how did
$$\sum_{k=1}^{n}|a_{k}^Tu_{1}|^2$$
transform to $$u_{1}^TAA^Tu_{1}$$ ?
As I understand $$|a_{k}^Tu_1|^2$$
is just a sum of squares of the numbers of multiplications(inner products) of all columns with first singular eigenvector, right?
2026-03-26 08:00:54.1774512054
Proof of "The sum of squared distances from the points to the line is a minimum"
88 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in LINEAR-ALGEBRA
- An underdetermined system derived for rotated coordinate system
- How to prove the following equality with matrix norm?
- Alternate basis for a subspace of $\mathcal P_3(\mathbb R)$?
- Why the derivative of $T(\gamma(s))$ is $T$ if this composition is not a linear transformation?
- Why is necessary ask $F$ to be infinite in order to obtain: $ f(v)=0$ for all $ f\in V^* \implies v=0 $
- I don't understand this $\left(\left[T\right]^B_C\right)^{-1}=\left[T^{-1}\right]^C_B$
- Summation in subsets
- $C=AB-BA$. If $CA=AC$, then $C$ is not invertible.
- Basis of span in $R^4$
- Prove if A is regular skew symmetric, I+A is regular (with obstacles)
Related Questions in MATRICES
- How to prove the following equality with matrix norm?
- I don't understand this $\left(\left[T\right]^B_C\right)^{-1}=\left[T^{-1}\right]^C_B$
- Powers of a simple matrix and Catalan numbers
- Gradient of Cost Function To Find Matrix Factorization
- Particular commutator matrix is strictly lower triangular, or at least annihilates last base vector
- Inverse of a triangular-by-block $3 \times 3$ matrix
- Form square matrix out of a non square matrix to calculate determinant
- Extending a linear action to monomials of higher degree
- Eiegenspectrum on subtracting a diagonal matrix
- For a $G$ a finite subgroup of $\mathbb{GL}_2(\mathbb{R})$ of rank $3$, show that $f^2 = \textrm{Id}$ for all $f \in G$
Related Questions in VECTORS
- Proof that $\left(\vec a \times \vec b \right) \times \vec a = 0$ using index notation.
- Constrain coordinates of a point into a circle
- Why is the derivative of a vector in polar form the cross product?
- Why does AB+BC=AC when adding vectors?
- Prove if the following vectors are orthonormal set
- Stokes theorem integral, normal vector confusion
- Finding a unit vector that gives the maximum directional derivative of a vector field
- Given two non-diagonal points of a square, find the other 2 in closed form
- $dr$ in polar co-ordinates
- How to find reflection of $(a,b)$ along $y=x, y = -x$
Related Questions in SINGULAR-VALUES
- Singular Values of a rectangular matrix
- Connection between singular values, condition and well-posedness
- Does the product of singular values of a rectangular matrix have a simple expression?
- Clarification on the SVD of a complex matrix
- Intuitive explanation of the singular values
- What are the characteristics that we can use to identify polynomials that have singular points?
- Zolotarev number and commuting matrices
- Spectral norm of block and square matrices
- Why is the Schmidt decomposition of an operator not unique?
- Smallest singular value of full column rank matrix
Related Questions in PRINCIPAL-COMPONENT-ANALYSIS
- Stuck in derivation of PCA
- How do we know we're maximizing the Lagrangian objective function in PCA?
- SVD and PCA - help
- Why eigenvectors with the highest eigenvalues maximize the variance in PCA?
- Calculating Principal Components - PCA - Calculating Eigenvectors as PCs
- Why do eigenvectors arise as the solution of PCA?
- Confusion on covariance matrix equation
- Relationship between the singular value decomposition (SVD) and the principal component analysis (PCA). A radical result(?)
- Fitting a plane to points using SVD
- Principal Component Analysis Linear Transformation
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
This doesn't directly answer the question, but it is a method I came up with about 30 years ago for finding the line which minimizes the sum of the squares of the distances to a set of 2d points.
This was reformatted from LaTeX to MathJax, so there are some incomplete conversions.
I may have submitted this before - I often forget things like that.
The problem investigated here is that of fitting a straight line to a set of data points. The fit is intended to be independent of the coordinate system in the sense that, if the data points are rotated or shifted, the resulting line will be the same relative to the points.
The standard least squares fitted line does not satisfy this criteria, because the error in the fit for each point is considered to be the distance from the point to the line parallel to one of the ordinate axes. When the axes are rotated, the distance changes.
We choose to use the actual distance from the point to the line as the error in the fit. This yields an error which is invariant under translations and rotations.
No claim is made about the originality of these results. This is an independent derivation (made in 1985) of an often discovered technique.
\section{The polar form of the equation of a straight line}
There are a number of forms that the equation of a straight line can take (e.g., point-slope, intersection with axes, $y~=~mx+b$).
The form most useful for our purpose is the $polar$ form. In this form, a line $L$ is determined by its distance r from the origin and the angle $\theta$ that the normal from the origin to the line makes with the $x$ axis, the angle being measured counterclockwise.
The equation of $L$ in terms of r and $\theta$ is
$\eqalignno{L:~~ x\cos \theta + y \sin \theta ~=~r.&(1)}$
This can be verified by noting that $x~=~r/\cos\theta$ at $y~=~0$ and $y~=~r/\sin\theta$ at $x~=~0$.
Another derivation of equation (1) for $L$ is as follows: Let $(x,y)$ be a point on $L$, $d$ the distance of $(x,y)$ from the origin, and $\phi$ the angle that the line from the origin to $(x, y)$ makes with the $x$-axis. The angle between this line and the normal to $L$ is easily seen to be $\theta-\phi$. Thus, $r/d ~=~\cos(\theta-\phi)$.
Since $x~=~d\cos \phi$ and $y~=~d\sin \phi$, $$\eqalignno{ x \cos \theta + y \sin \theta &~=~ d \cos (\theta-\phi)\cr &~=~ r.\cr }$$ This derivation, though more complicated than the preceding one, has the advantage of also giving explicit formulae for $x$ and $y$ in terms of $r$, $\theta$, and $\phi$, $$x~=~{r\cos\phi \over \cos(\theta-\phi)} \quad{\rm and}\quad y~=~{r\sin\phi \over \cos(\theta-\phi)}.$$
The polar form for the equation of $L$ is so useful because the distance from any point $(u, v)$ to $L$ is $u\cos \theta + v \sin \theta - r$. To show this, consider the line $L'$ through $(u, v)$ that is parallel to $L$, is $u\cos\theta+v\sin\theta-r$} If $d$ is the distance from $L'$ to $L$, which is also the distance from $(u, v)$ to $L$, the equation of $L'$ is $x\cos\theta+y\sin\theta~=~d+r$, since $L$ and $L'$ are parallel.
Since $(u, v)$ is on $L'$, $u\cos\theta+v\sin\theta~=~d+r$, so that, as claimed, $d~=~u\cos\theta+y\sin\theta-r$.
\section{Evaluating the mean squared error}
We now define some notation and abbreviations. Let $c~=~\cos \theta$ and $s~=~\sin \theta$, so the equation of $L$ is $cx+sy~=~r$. The data points used to fit $L$ are $(x_i, y_i)$ for $i=1$ to $n$ (i.e., there are $n$ points).
For any expression f, we define $\overline{f} $ to be the average of f over all the data points, so that $$\overline{f} ~=~(1/n)\sum_{i=1}^n f_i.$$
For example, $\overline{x} ~=~(1/n)\sum_{i=1}^n x_i$, and $\overline{xy} ~=~(1/n)\sum_{i=1}^n x_i y_i$.
Note: We can also define $\overline{f} $ to be a weighted mean $$\overline{f} ~=~{\sum_{i=1}^n f_i w_i \over \sum_{i=1}^n w_i}$$ where each $w_i > 0$, and the results which follow are not affected at all.
If $p$ and $q$ are any of $x$ and $y$, we define $$\langle p,q\rangle ~=~\overline{pq} -\overline{p}\ \overline{q},$$ the covariance between the variables $p$ and $q$. For example, $$\langle x,x\rangle ~=~\overline{x^2}-\overline{x}^2 \quad{\rm and}\quad \langle x,y\rangle ~=~\overline{xy}-\overline{x}\ \overline{y}.$$
If D is the mean squared error of $L$, then
$$\eqalignno{ D &~=~ \overline{({\rm distance\;from\;point\;}i{\rm\;to\;}L)^2} \cr &~=~ \overline{d^2}\cr &~=~ \overline{(cx+sy-r)^2} \cr &~=~ \overline{c^2x^2+s^2y^2+r^2 + 2csxy-2crx-2sry} \cr }$$ so that $$\eqalignno{D~=~ c^2\overline{x^2} +s^2\overline{y^2} +r^2+ 2sc\overline{xy} -2cr\overline{x} -2sr\overline{y} . &(2)}$$
\section{Minimizing the mean squared error}
If $L$ is to be the best fitting line in the least mean squared sense, we must have
$${\partial D \over \partial r}~=~0 {\rm \quad and \quad} {\partial D \over \partial \theta}~=~0.$$
Since ${\partial D \over \partial r} =2r-2c\overline{x}-2s\overline{y} $, this implies that $r =c\overline{x}+s\overline{y} $. Since the equation of the line is $cx+sy = r$, this implies that the line passes through $(\overline{x}, \overline{y}) $.
$\begin{array}\\ 0 &={\partial D \over \partial \theta} \\ &={\partial \over \partial \theta}(c^2\overline{x^2} +s^2\overline{y^2} +r^2+ 2sc\overline{xy} -2cr\overline{x} -2sr\overline{y})\\ &=-2cs\overline{x^2} +2cs\overline{y^2} + 2(c^2-s^2)\overline{xy} +2sr\overline{x} -2cr\overline{y}\\ &=2cs(\overline{y^2} -\overline{x^2}) + 2(c^2-s^2)\overline{xy} +2r(s\overline{x} -c\overline{y})\\ &=2cs(\overline{y^2} -\overline{x^2}) + 2(c^2-s^2)\overline{xy} +2(c\overline{x}+s\overline{y})(s\overline{x} -c\overline{y})\\ &=2cs(\overline{y^2} -\overline{x^2}) + 2(c^2-s^2)\overline{xy} +2(cs\overline{x}^2-(s^2-c^2)\overline{x}\ \overline{y}-cs\overline{y}^2)\\ &=2cs(\overline{y^2}-\overline{y}^2 -\overline{x^2} +\overline{x}^2) +2(c^2-s^2)(\overline{xy}-\overline{x}\ \overline{y})\\ &=2cs(\langle y, y\rangle-\langle x, x\rangle) +2(c^2-s^2)\langle x, y\rangle\\ &=\sin(2\theta)(\langle y, y\rangle-\langle x, x\rangle) +2\cos(2\theta)\langle x, y\rangle\\ \\ \end{array} $
From this, $\theta$ can be determined.
However, the values of $r$ and $\theta$ that minimize $D$ can also be found without using any calculus. This will now be done by writing $D$ as the sum of terms which, when independently minimized, give the desired values for $r$ and $\theta$. $$\eqalignno{ D&~=~ c^2\overline{x^2} +s^2\overline{y^2} +r^2+ 2sc\overline{xy} -2cr\overline{x} -2sr\overline{y}\cr &=~ c^2\overline{x^2} +s^2\overline{y^2} +2sc\overline{xy} +r^2 -2r(c\overline{x} +s\overline{y})\cr &=~ c^2\overline{x^2} +s^2\overline{y^2} +2sc\overline{xy} +(r-c\overline{x} -s\overline{y})^2 -(c\overline{x} +s\overline{y})^2\cr &=~c^2(\overline{x^2}-\overline{x}^2) + s^2(\overline{y^2}-\overline{y}^2) + 2sc(\overline{xy}-\overline{x}\overline{y}) +(r-c\overline{x} -s\overline{y})^2\cr &=~c^2\langle x,x\rangle + s^2\langle y,y\rangle +2sc\langle x,y\rangle +(r-c\overline{x} -s\overline{y})^2.&(3)\cr }$$ Letting $S=\sin 2\theta =2sc$ and $C=\cos 2\theta=c^2-s^2$, since $c^2=(1+C)/2$ and $s^2=(1-C)/2$,
$$\eqalignno{ D~&=~\frac{1+C}{2}\langle x,x\rangle + \frac{1-C}{2}\langle y,y\rangle +S\langle x,y\rangle +(r-c\overline{x} -s\overline{y})^2\cr ~&=~{\langle x,x\rangle +\langle y,y\rangle \over 2} + C {\langle x,x\rangle-\langle x,y \rangle \over 2} + S \langle x,y\rangle +(r-c\overline{x} -s\overline{y})^2\cr &=~D_1+C~D_2+S~D_3 +(r-c\overline{x} -s\overline{y})^2&(4)\cr } $$
where $$\eqalignno{ D_1&~=~{\langle x,x\rangle +\langle y,y\rangle \over 2},&(5a)\cr D_2&~=~{\langle x,x\rangle -\langle y,y\rangle \over 2},&(5b)\cr D_3&~=~\langle x,y\rangle .&(5c)\cr }$$ Defining $R$ and $\phi$ by $$\eqalignno{D_2~=~R\cos\phi\quad {\rm and}\quad D_3~=~R\sin\phi, &(6)}$$ where $R \ge 0$ and $0 \le \phi < 2\pi$, $$\eqalignno{ D~&=~D_1+C~D_2+S~D_3 +(r-c\overline{x} -s\overline{y})^2\cr &=~D_1+R\cos 2\theta\cos\phi+R\sin 2\theta\sin\phi +(r-c\overline{x} -s\overline{y})^2\cr &=~D_1+R\cos(2\theta-\phi) +(r-c\overline{x} -s\overline{y})^2.&(7)\cr }$$ This is the desired expression for $D$.
Since $\cos(2\theta-\phi) \ge -1$ and $(r-c\overline{x}-s\overline{y})^2 \ge 0$, $D \ge D_1-R$. By choosing $$\eqalignno{ \theta~&=~{\phi+\pi \over 2} \quad{\rm and}\quad r~=~\overline{x}\cos\theta + \overline{y}\sin\theta, &(8)}$$ $D$ will achieve its minimum value $$\eqalignno{ D~&=~D_1 - R. & (9)}$$
\section{Summary of line fitting algorithm}
The technique for fitting a line to a set of data points is:
1. Gather the needed mean values: $\overline{x}$, $\overline{y}$, $\overline{xy}$, $\overline{x^2}$, and $\overline{y^2}$.
2. Using $\langle p,q\rangle ~=~\overline{pq}-\overline{p}\ \overline{q}$, compute the covariances $\langle x,x\rangle $, $\langle y,y\rangle $, and $\langle x,y\rangle $ and set $$\eqalignno{ D_1&~=~{\langle x,x\rangle +\langle y,y\rangle \over 2},\cr D_2&~=~{\langle x,x\rangle -\langle y,y\rangle \over 2},\cr D_3&~=~\langle x,y\rangle .\cr }$$
3. Get the mean squared error $D$ by $$R~=~\sqrt{D_2^2 + D_3^2}$$ and $$D~=~D_1 - R.$$
4. To find $r$ and $\theta$, the parameters of $L$ when in polar form, set $\phi$ to the value of $\tan^{-1}(D_3 / D_2)$ such that $D_2~=~R\cos\phi$ and $D_3~=~R\sin\phi$. (This can be done in Fortran using the ATAN2 function.) Then $$\theta~=~{\phi+\pi \over 2} \quad{\rm and}\quad r~=~\overline{x}\cos\theta + \overline{y}\sin\theta.$$