PCA-algorithm dependence on number of principal components.

72 Views Asked by At

Consider the following idea: we have $(X,Y) \in \mathbb{R}^{n \times (d+k)}$ - initial data. Let's call $\bar{X} = \dfrac{1}{n}\sum_i x_i$ (the same for $\bar{Y}$). We want to predict $y$ by $x$ using the following algorithm:

  1. We select $r$ first principal components $u_1 \ldots u_r$ of matrix $Z = [XY]$. Let $U$ be matrix based on $u_i$ (as rows). Let $U_X$ and $U_Y$ be the part correspond to $X$ and $Y$.

  2. We reduce dimension using following map: $(x,y) \to \lambda = (\lambda_1 \ldots \lambda_r)$: $$ \lambda = U^\top (z^\top - (\bar{X} \bar{Y})^\top) $$

And we also have an inverse map:

$$ \begin{array} \tilde{\tilde{x}}(\lambda) = \bar{X} + U_X\lambda\\ \tilde{y}(\lambda) = \bar{Y} + U_Y\lambda \end{array} $$

So in order to get a prediction we do two steps:

$$ \lambda^* = \arg \min_{\lambda \in \mathbb{R}^r} \|\tilde{x}(\lambda) - x\|_2^2\\ y^* = \bar{Y} + U_Y \lambda^* $$

I need to compute value of $\lambda^*$ and $y^*$ based on $x, X,Y, U$. It seems to be pretty easy:

$$ \lambda^*(x) = \text{arg}\min_{\lambda \in \mathbb{R}^r} \|\bar{X} + U_X \lambda - x \|^2 \\ \dfrac{\partial}{\partial \lambda}\|\bar{X} + U_X \lambda - x \|^2 = 2U_X^\top (\bar{X} + U_X \lambda - x ) = 0 \Rightarrow \lambda^* = (U_X^\top U_X)^{-1}U_X^\top(x - \bar{X}) \Rightarrow \\ y^*(x) = \bar{Y} + U_Y (U^\top_X U_X)^\top U^\top_X (x - \bar{X}) $$

But it says that the answer should depend on $r \le d$ or $r > d$. I don't see the reason. Maybe the problem is following: if we select $r > d$ components then each $u_i = (u_{X,i} u_{Y,i})$ and if $r > d$ then X part is linear dependent of each other? However, even though I don't see how does it affect the steps. Any hints?