Question about properties of ridge regression using properties of minimums.

14 Views Asked by At

I was learning about regularization techniques in machine learning, when I came across Ridge regression that was a bit confusing. I was wondering how the equation of Ridge Regression $\hat{\beta}^{ridge}= argmin_\beta (\sum_{i=1}^N(y_i-\beta_0-\sum_{j=1}^px_{ij}\beta_j)^2+\lambda\sum_{j=1}^p\beta_j^2)$ is equivalent to $argmin_\beta (\sum_{i=1}^N(y_i-\beta_0-\sum_{j=1}^px_{ij}\beta_j)^2,$ subject to $\sum_{j=1}^p\beta_j^2\leq{t}$. I was told this follows from a property of minimums, where $\min{a,b} <=> \min{a},$ subject to $b\leq{\epsilon}$. But I can't quite wrap my head around why this property of minimums works. If anyone can help with more insight, I would greatly appreciate it. Thank you!