Proof of Optimality for Approximation of Probability Spaces by PCA

136 Views Asked by At

I have come across a theorem that states, that the $d$-dimensional subspace found by PCA is the optimal approximation of a probability space with such a plane, in the sense that it minimises the exspectation of the squared orthogonal distance.

I could not find a proof of that fact, neither in the book on PCA by Jolliffe nor in standard statistic textbooks. However, I found some sketches for proving the theorem in a discrete setting, where there are finitely many points that should be approximated by a $d$-dimensional plane. I didn't see how to transfer these results, though.

I would highly appreciate any hints on literature or general comments for the version with probability spaces.

This is my version of the actual theorem:

Let $\Omega \subset R^n$ and $(\Omega, A, \mu)$ be a probability space. Then define the mean and covariance matrix

$$m = \int_{\Omega} x\ d\mu = E[Id] \in R^n\\ cov = E[(Id-c)(Id-c)^T] \in R^{n,n}$$

Now consider the orthogonal eigen-decomposition $$ cov = \Phi \Sigma \Phi^T,$$ where $\Phi \in R^{n,n}$ is an orthogonal matrix and $\Sigma = \text{diag}(\lambda_1,...,\lambda_n)$ diagonal with $\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_n \ge 0$. Let $v_i \in R^n$ be the columns of $\Phi$. For $d \in N$, $d < n$ let $W$ be the $d$-dimensional affine subspace $$W := \text{span}(v_1,...,v_d) + m.$$

Then the following holds: $$W = \underset{\Pi}{\text{argmin}} \int_{\Omega} \| x-P_{\Pi}(x) \|^2 d\mu(x) = \underset{\Pi}{\text{argmin}} \, E[\| x-P_{\Pi}(x) \|^2],$$ where $\Pi$ is the set of all $d$-dimensional affine subspaces.

Thanks in advance!