Projections onto subspace - least square method

241 Views Asked by At

I've been trying to understand projections onto subspace. I kind of get it now but there's one thing I can't really wrap my head around. One formula I see is $x = (A^TA)^{-1}A^Tb$ which I understand basically is least square. I know that the least square method is used to minimise the distance between b and Ax. However, what distance am I actually trying to minimise here. What is X in that formula? And how does it relate to projection? If I'm correct, A is basically the basis of the subspace I'm projecting on, right?

To conclude, what do I actually get when using least square for projection onto subspace and when do I want to use it?

1

There are 1 best solutions below

0
On

Suppose $y\in \mathbb{R}^n$. Let $\mathbf{C}(X)$ denote the column space of $X$ and $\mathbf{C}(X)^\perp$ be its orthogonal complete. Consider a least squares optimization problem $$\min_{P}\lVert y-P_{\mathbf{C}(X)}(y)\rVert_2^2$$

The "error" is the discrepancy between $y$ and $P_{\mathbf{C}(X)}(y)$, which is the distance we are trying to minimize.

Then (in OLS) $$P_{\mathbf{C}(X)} = X(X^TX)^{-1}X^T $$ will provide the smallest error.

$y$ can be broken into two parts

$$y = \hat{y} +\hat{\epsilon} = P_{\mathbf{C}(X)}(y)+P_{\mathbf{C}(X)^\perp}(y)$$

This can be seen in this visualization

This can also be written as

$$\min_{\beta}\lVert y-X\beta\rVert_2^2$$

where the optimal solution is $\hat{\beta} = (X^TX)^{-1}X^Ty$, This is why you see this formula frequently in relation to minimizing distance. In particular, the specific relation between the projection and $\hat{\beta}$ is

$$P_{\mathbf{C}(X)}(y) = X(X^TX)^{-1}X^T y = X \hat{\beta}$$