In many machine learning problems, the objective function we aim to solve has the form:
$\min_w \mathcal{L}(w) + \lambda\mathcal{R}(w)$,
where $\mathcal{L}(w)$ (e.g., squared loss) is a loss function, $\mathcal{R}(w)$ is a regularization function (e.g., $\mathcal{R}(w) = ||w||_2^2$), and $\lambda\ge 0$ is a regularization parameter that controls the tradeoff between $\mathcal{L}(w)$ and $\mathcal{R}(w)$.
If we have $\lambda_1 \le \lambda_2$, and the corresponding solutions are $w_1$ and $w_2$. The intuition is that we will have $\mathcal{L}(w_1) \le \mathcal{L}(w_2)$ since a larger $\lambda$ leads to a stronger constraint (Or we can also think about the problem from the dual perspective). I was wondering if there is a way to formally prove this intuition?
Thanks!
Yes, just shuffling terms around is enough! The fact that $w_1$ and $w_2$ minimize their respective objectives means that
$L(w_1)+\lambda_1 R(w_1)\leq L(w_2)+\lambda_1 R(w_2)$
and
$L(w_2)+\lambda_2 R(w_2)\leq L(w_1)+\lambda_2 R(w_1)$.
Rearranging these, we get
$\lambda_2 \big[R(w_2) - R(w_1)\big]\leq L(w_1) - L(w_2) \leq \lambda_1 \big[R(w_2)- R(w_1)\big]$
and subtracting,
$(\lambda_2-\lambda_1)\big[R(w_2) - R(w_1)\big]\leq 0$.
Since $(\lambda_2-\lambda_1) > 0$ we may divide it out to get
$\big[R(w_2) - R(w_1)\big]\leq 0$
and since $\lambda_1>0$ we may multiply by it to get
$\lambda_1 \big[R(w_2)- R(w_1)\big]\leq 0$.
Putting it all together,
$L(w_1) - L(w_2) \leq \lambda_1 \big[R(w_2)- R(w_1)\big]\leq 0$.