Unclear PCA derivation

343 Views Asked by At

I found an unclear part in derivation of PCA in the lecture notes of A. Bandeira for MIT Fall 2015 18.S096: Topics in Mathematics of Data Science course.

In $\S 1.1.1$ the author derives PCA as best $d$-dimensional affine fit as follows. We take data points $x_1,\ldots,x_n \in \mathbb{R}^p$ and search for their representation (as coordinates $\beta_k$) in a $d$-dimensional affine subspace defined by shift $\mu$ and orthogonal basis $V=[v_1,\ldots,v_d]$ via a least-squares fit problem: $$ \min\limits_{\mu,\beta_k,V:V^TV=I}\sum\limits_{k=1}^{n}\|x_k - (\mu + V\beta_k)\|^2_2. $$

First we try to optimize for $\mu$ and use first-order condition $$\nabla_\mu\sum\limits_{k=1}^{n}\|x_k - (\mu + V\beta_k)\|^2_2=0,$$ for which to hold we need that $$\left(\sum\limits_{k=1}^{n}x_k\right) -\mu n - V\left(\sum\limits_{k=1}^{n}\beta_k\right)=0.$$

Question:

At this point the author says that $\sum\limits_{k=1}^{n}\beta_k=0$ and goes on with the proof (which is fine), however I could find no rigorous reason why this should be true. Simple examples suggest such a fact (say, take two 2d points and fit a line to them), but I would appreciate an explanation.

2

There are 2 best solutions below

3
On

I cannot comment(I have not enough point), so it comes as a solution. I am very interested also to see how your teacher conclude that, especially looks like it is evident for him. It is not clear, because it says we try to find the best d-dimnetinal affine subspace which the projections of $x_1, . . . , x_n$ on it best approximate the original points $x_1, . . . , x_n$. So we can directly deduce that $(\beta_{i})_k = v_i^T (x_k − \mu)$ Basically this is projection of $x_k-\mu$ on $v_i$. And the approximation is $\widehat{x_k}=\mu+\sum_{i} (\beta_{i})_{k }v_i$

0
On

I think I found a way to demonstrate it. Restating the minimisation of error like a least square regression, adding the translation part into matrix V, a column of 1s, and normalised, calling the new matrix A. The last column of $AA^{T}$ has to be perpendicular to affine subspace, and that is enough to demonstrate the affine subspace has to pass through the centroid , $\mu_{n} $.