Sampling error in SVD

29 Views Asked by At

The prototype of this problem is about how do to estimate sampling errors in PCA analysis, and it seems harmless so we restate this in terms of SVD.

Consider $x_1,\dots,x_n,x_{n+1}\in \mathbb{R}^m$ sampled from multi-dimensional gaussian distribution $N(0,\Sigma)$. For our purpose we can assume $\Sigma$ is a diagonal matrix with diagonal term $\sigma_1^2,...,\sigma_m^2,\,\text{s.t.}\,,\,\sigma_1>...>\sigma_m>0$. Then do SVD for matrix $(x_1,\dots,x_n)=U*W*V^{T}$ and let the first $k$ row vectors of $U$ be $v_1,\dots,v_k$. (Which is the analog of principal components in PCA)

Then we project $x_{n+1}$ to the linear span of $v_1,\dots,v_k$, assume we get $x^{\prime}_{n+1}$. And I'm interested in the expectation of $\|x_{n+1}-x^{\prime}_{n+1}\|_2^2$, namely the residual. If there is no sampling error, we would have $v_1=(1,0,\dots,0)^{T},\,v_2=(0,1,0,\dots,0)^{T},\dots$. Thus the expectation would be $\sigma_{k+1}^2+\dots+\sigma_{m}^2$. You may assume $m \gg k$ and $\sigma_1 \gg ... \gg \sigma_m$ if nessesary. I'm most interested in asymptotic behaviour of $n\rightarrow \infty$. The final answer might be $\sigma_{k+1}^2+\dots+\sigma_{m}^2+f(\sigma_1^2,...,\sigma_m^2)/n+O(1/n^2)$. Any hints for getting $f(\sigma_1^2,...,\sigma_m^2)$?