Must a stronger regularization lead to a larger loss?

26 Views Asked by At

In many machine learning problems, the objective function we aim to solve has the form:

$\min_w \mathcal{L}(w) + \lambda\mathcal{R}(w)$,

where $\mathcal{L}(w)$ (e.g., squared loss) is a loss function, $\mathcal{R}(w)$ is a regularization function (e.g., $\mathcal{R}(w) = ||w||_2^2$), and $\lambda\ge 0$ is a regularization parameter that controls the tradeoff between $\mathcal{L}(w)$ and $\mathcal{R}(w)$.

If we have $\lambda_1 \le \lambda_2$, and the corresponding solutions are $w_1$ and $w_2$. The intuition is that we will have $\mathcal{L}(w_1) \le \mathcal{L}(w_2)$ since a larger $\lambda$ leads to a stronger constraint (Or we can also think about the problem from the dual perspective). I was wondering if there is a way to formally prove this intuition?

Thanks!

1

There are 1 best solutions below

1
On BEST ANSWER

Yes, just shuffling terms around is enough! The fact that $w_1$ and $w_2$ minimize their respective objectives means that

$L(w_1)+\lambda_1 R(w_1)\leq L(w_2)+\lambda_1 R(w_2)$

and

$L(w_2)+\lambda_2 R(w_2)\leq L(w_1)+\lambda_2 R(w_1)$.

Rearranging these, we get

$\lambda_2 \big[R(w_2) - R(w_1)\big]\leq L(w_1) - L(w_2) \leq \lambda_1 \big[R(w_2)- R(w_1)\big]$

and subtracting,

$(\lambda_2-\lambda_1)\big[R(w_2) - R(w_1)\big]\leq 0$.

Since $(\lambda_2-\lambda_1) > 0$ we may divide it out to get

$\big[R(w_2) - R(w_1)\big]\leq 0$

and since $\lambda_1>0$ we may multiply by it to get

$\lambda_1 \big[R(w_2)- R(w_1)\big]\leq 0$.

Putting it all together,

$L(w_1) - L(w_2) \leq \lambda_1 \big[R(w_2)- R(w_1)\big]\leq 0$.