I was reading this about least squares and came cross the following claim.
https://inst.eecs.berkeley.edu/~ee127a/book/login/l_ols_kernels.html
$\displaystyle\min_w : |X^Tw - y|_2^2 + \lambda |w|_2^2 $ solution of this least squares is in $\operatorname{span}(X)$, I am trying to understand the proof for this.
I found following article which states that any vector can be expressed as sum of orthogonal vectors and so we can choose $Xv$ which is in the $\operatorname{span}(X)$ and $r$ orthogonal vector to range of $x$ to represent $w$.
But I don't understand how this proves optimal $w$ being in $\operatorname{span}(X)$.
Can you anyone help me in proving this.
Answer by @reuns:
Write $ w = v + u $ where $ v \in \operatorname{span} \left( x \right) $, $ u \notin \operatorname{span} \left( x \right) $, namely $ \left\| v + u \right\|^{2} = \left\| v \right\|^{2} + \left\| u \right\|^{2} $.
Hence $ {X}^{T} w = {X}^{T} \left( v + u \right) = {X}^{T} c $. This suggests it is always best to choose $ u = 0 $ since it reduces $ \left\| v + u \right\|^{2} $ without changing $ \left\| {X}^{T} \left( u + v \right) - y \right\|^{2} $.