I have been trying to understand the details in Ridge and Lasso regression and I am having some troubles. For what I know the formula that we want to minimize is the following:
The formula was of the Lasso regression, and so the material that I was reading says:
and here I have one question in the above equation I see lambda, but I cannot see t, or is that:
Also, in the following figure:
For what I know the ellipses are the level curves of the function:
where in the center of the ellipses is the OLS, or the more complex model; and according I get further away from the center of the ellipse the RSS increases. In some material I read, that for example in the Ridge regression, the sphere decreases until the point where it touches the border of the outer level curve, it is like that? Could anybody explain me why one should search for that intersection? Does it has to be the point where the model is less complex, but avoiding maybe a high bias or an underfitting situation?
Please any detailed clarification would be greatly appreciated.
Thank you.





There is a one-to-one correspondence between $t$ and $\lambda$ that shows them to be inversely related.
The first formula you showed is the constrained optimization formula of lasso, while the second formula is the equivalent regression or Lagrangean representation. Notice that the off-hand constraint in the first has been absorbed into the objective function in the second. But because of the difference in scales for these two hyperparameters, $t \neq \lambda$ for the same $\beta$ solution.
It is not that $\|\beta\|_1 = \|\beta\|_1<t$, but that decreasing $t$ in the first (constrained) formula has the same effect as increasing $\lambda$ in the second (Lagrangean) formula. And that effect is the shrinkage/regularization of the regression coefficients $\beta$. The one-to-one correspondence between $t$ and $\lambda$ does not have an analytical solution because their values depend on the data itself, but they will always move in opposite directions by design.
As for the RSS elliptical contours touching the ridge sphere in the bivariate case, that tangency point merely is the location that proves that ridge (left-hand plot) is differentiable at $\|\beta\|_q = 0$ for $q>1$ ($q=1$ being lasso), making ridge unable to reduce all the way to $\beta=0$. The upside of this, though, is that it allows an analytical closed-form solution to be derived for ridge, so that optimization is not required.
On the other hand, lasso (right-hand plot), because of its diamond shape in the bivariate case, can reduce to $\beta=0$. The downside of this is that lasso, unlike ridge, does not have a closed-form solution, and must be solved with optimization.
These intersections are of interest because they are the locations where full regularization is achieved, which can be interpreted as lower model complexity because of the lower, or no, weight ($\beta$) being applied to individual variables in the regression through the introduction of bias (via $t$ or $\lambda$) as a means of reducing overfitting