Why does the Lasso only select only $n$ variables if $p \gg n$

96 Views Asked by At

$L_1$ or lasso regularization in regression problems is defined as $$\min||X\beta - y||_2^2 + \lambda ||\beta||_1$$ Multiple resources point out that for $X\in \mathbb{C}^{n\times p}$ if $p \gg n$ lasso only selects $n$ variables and then saturates but they never give a reference or examples (e.g. elastic net). Also if two columns of $X$ are dependent it's said that lasso selects only one of them and sets all others to zero. This is probably an obvious fact but I can't wrap my head around it.

E.g. if I look at $X = \begin{bmatrix} 0.5 & 0.5\end{bmatrix}$ and $y = 1$ why can't lasso give both of these values equal weight?