Unclear PCA derivation

343 Views Asked by Bumbble Comm At 25 Mar 2026 - 8:33

I found an unclear part in derivation of PCA in the lecture notes of A. Bandeira for MIT Fall 2015 18.S096: Topics in Mathematics of Data Science course.

In $\S 1.1.1$ the author derives PCA as best $d$-dimensional affine fit as follows. We take data points $x_1,\ldots,x_n \in \mathbb{R}^p$ and search for their representation (as coordinates $\beta_k$) in a $d$-dimensional affine subspace defined by shift $\mu$ and orthogonal basis $V=[v_1,\ldots,v_d]$ via a least-squares fit problem: $$ \min\limits_{\mu,\beta_k,V:V^TV=I}\sum\limits_{k=1}^{n}\|x_k - (\mu + V\beta_k)\|^2_2. $$

First we try to optimize for $\mu$ and use first-order condition $$\nabla_\mu\sum\limits_{k=1}^{n}\|x_k - (\mu + V\beta_k)\|^2_2=0,$$ for which to hold we need that $$\left(\sum\limits_{k=1}^{n}x_k\right) -\mu n - V\left(\sum\limits_{k=1}^{n}\beta_k\right)=0.$$

Question:

At this point the author says that $\sum\limits_{k=1}^{n}\beta_k=0$ and goes on with the proof (which is fine), however I could find no rigorous reason why this should be true. Simple examples suggest such a fact (say, take two 2d points and fit a line to them), but I would appreciate an explanation.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 24 Sep 2015 - 5:24

I cannot comment(I have not enough point), so it comes as a solution. I am very interested also to see how your teacher conclude that, especially looks like it is evident for him. It is not clear, because it says we try to find the best d-dimnetinal affine subspace which the projections of $x_1, . . . , x_n$ on it best approximate the original points $x_1, . . . , x_n$. So we can directly deduce that $(\beta_{i})_k = v_i^T (x_k − \mu)$ Basically this is projection of $x_k-\mu$ on $v_i$. And the approximation is $\widehat{x_k}=\mu+\sum_{i} (\beta_{i})_{k }v_i$

Bumbble Comm On 25 Sep 2015 - 5:20

I think I found a way to demonstrate it. Restating the minimisation of error like a least square regression, adding the translation part into matrix V, a column of 1s, and normalised, calling the new matrix A. The last column of $AA^{T}$ has to be perpendicular to affine subspace, and that is enough to demonstrate the affine subspace has to pass through the centroid , $\mu_{n} $.

Unclear PCA derivation

Question:

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in ALGEBRA-PRECALCULUS

Related Questions in OPTIMIZATION

Related Questions in PRINCIPAL-COMPONENT-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions