What is the role of Tikhonov regularization in optimization?

663 Views Asked by At

Suppose I have the following objective function

$$ L = \frac{1}{|X|}\sum_{x \in X} \| \hat{y} - y \|^2_2 + \lambda \|w\|_2^2 $$

where $X$ are my data, $\hat{y}$ the prediction, $y$ the target, $\lambda$ the regularization coefficient, and $w$ the model's parameters.

In Machine Learning we use regularization to obtain models that can hopefully generalize to unseen data. But, what is the role of regularization in optimization?

2

There are 2 best solutions below

0
On

The closest that "pure optimization" gets to regularization is probably the field of Robust Optimization, which aims to improve performance on unseen data by arming an adversary with the ability to perturb your solution ex-post. That is, you make a decision, then nature gets to selects the worst-case realization from some uncertainty set to change the problem slightly, but you are aware of what nature is going to do when you make your decision, so you make a better decision with respect to unseen data.

Indeed, in linear regression, the following robust linear regression problem:

\begin{align*} \min_{w} \ \max_{\Delta: \Vert \Delta \Vert_F \leq \lambda} \ \frac{1}{\vert X \vert} \sum_x \Vert (\Delta+X)w-y\Vert_2^2, \end{align*} is directly equivalent to your original problem, by duality!

See this paper: https://www.sciencedirect.com/science/article/abs/pii/S0377221717302734 for more information on this.

2
On

One idea is that adding a regularization term $\lambda \|x\|^2$ to the objective function in a convex optimization problem ensures that the new objective function $f$ is strongly convex. This allows the use of optimization algorithms which require an assumption of strong convexity.

For example, since $f$ is strongly convex, its convex conjugate $f^*$ which appears in the dual problem is guaranteed to be differentiable, and this is helpful when using a gradient-based algorithm to solve the dual problem.