Upper bound on norm of linear projection on column space of $X$

438 Views Asked by At

Suppose we have the full rank matrix $X\in\mathbb{R}^{n\times d}$, with normalized columns: $j\in[d]: |X_j|=1$. Besides $\Pi \in \mathbb{R}^{n\times n}$ denotes the linear orthogonal projection from $\mathbb{R}^n$ onto $range(X)$, which can be computed as $\Pi:= X (X^\top X)^{-1}X^\top$ (the inverse is well defined because $X$ is full rank). The goal is to find an upper bound for $\|\Pi v\|^2$ for a point $v\in\mathbb{R}^n$ based on properties of matrix $X$ and the angles between vector $v$ and $range(X)$.

Here is one idea that I had to bound this. If $\gamma$ denots the smallest non-zero eigenvalue of $X^\top X$ then the spectral norm of $(X^\top X)^{-1}$ is bound by $\frac{1}{\gamma}$ (we know $\gamma > 0$ because matrix $X$ is full rank). Now we can write $$\|\Pi v\|^2 = v^\top \Pi^\top \Pi v = v^\top \Pi v = v^\top X (X^\top X)^{-1}X^\top v \le \frac{1}{\gamma}\|X^\top v\|^2 = \frac{1}{\gamma} \sum_{i=1}^d\langle X_i, v\rangle^2 $$ $$\|\Pi v\|^2\le \frac{1}{\gamma}\sum_{i=1}^d\langle X_i, v\rangle^2$$ One way to interpret this is to think of $\langle X_i, v\rangle$ as projection of $v$ onto $X_j$s. Now a natural way to generalize this is to bound $\|\Pi v\|$ based on projections onto topples of columns rather on single columns.

More formally suppose $\Pi_S$ denotes projection onto $range(X_S)$ for an arbitrary $S\subseteq[d]$, and $X_S$ the submatrix of $X$ indexed by $S$. Now for $k\ll d$ suppose we have computed the values $\forall S\subseteq[d], |S|\le k: \|\Pi_S v\|$. We know that $\|\Pi_{\{i\}} v\|^2 = \langle X_i, v\rangle ^2$ and the result that was shown before is the answer for the special case $k=1$. This bound also holds for $k>1$, since we have computed one-dimensional projections as well. But this bound is probably not tight. Roughly speaking, if $k$ columns are close to each other the might make $\gamma$ very small and hence the bound will be very lose. But if $v$ doesn't have a big projection onto these columns the overall bound must also be small. Any ideas on which theorem or approach might be useful here?