In Lasso regression, for a sparse estimate of coefficients $\beta$, we have:
$$ \hat{\beta}(\lambda) = \arg \min_b \Bigl\{\frac{1}{2} \|y-Xb\|^2_{2} + \lambda\|b\|_1\Bigl\} $$
One graph I saw that plotted out the coefficient values of $\beta$ vs. the $\lambda$ parameter is:
My question is why the graphs slope downwards to zero? In other words, why is it that when we increase $\lambda$, our coefficient estimates tend to zero? I did a thought experiment where I let the $\lambda||b||_1$ term get large, but I fail to see the connection. In other words:
1) Is it the case Lasso Paths always slope to zero as $\lambda$ gets large?
2) Do coefficient values always start positive? (if $\lambda = 0$, then we have OLS).
3) What is the intuition here?
Thanks!
