Regularization versus inequality constraint

2.5k Views Asked by At

In convex optimization, for what values of a regularization parameter $\alpha$ is there an equivalent inequality constraint? In particular, in the convex optimization problems below

$$\min f(x) + \alpha g(x) \tag{Problem 1}$$

$$\min f(x) \quad \text{subject to} \quad g(x) \le c \tag{Problem 2}$$

for what values of $\alpha$ does there exist a $c$ such that the two problems are equivalent?

2

There are 2 best solutions below

0
On BEST ANSWER

Let us consider equivalence of both problems in the sense that the minimum values are achieved with the same variable vector i.e. both problems have the same minimizer.

It is hard to determine the parameter $c$ of Problem 2, without knowing the optimial solution $x^\star$ of Problem 1. On the other hand, if you know $x^\star$, then both problems are equivalent if $c=g(x^\star)$.

To prove the latter statement let us consider Problem 2. We have to prove that we can achieve optimality with $x^\star$.

  • As $x^\star$ is feasible for problem 2, we have to prove that no other $x$ exists with $f(x)<f(x^\star)$
  • It is clear that $g(x)=c$ holds true for any minimizer of problem 2 (This follows from the optimality of $x^\star $ for problem 1)
  • Then, there cannot be any $x$ with $f(x)<f(x^\star)$ as this would object the optimiality of $x^\star$ for problem 1.

Remark: If you consider problem 1 and formulte the Langrange function, you could interpret $a\geq0$ as the Langrange multiplier.

0
On

Ert's remarks are good, but his passing mention at the end of his post really is the crux of the matter for most practical models.

Consider the Lagrangian for Problem 2; and for reasons that will become obvious, let's call the Lagrange multiplier for the inequality constraint $\alpha$: $$L(x,\alpha) = f(x) - \alpha ( c - g(x) ) = f(x) + \alpha g(x) - \alpha c$$ As you can see, Problem 1 is, to within an additive constant, just the minimization of the Lagrangian evaluated at a particular value of the Lagrange multiplier. So we can use the Lagrange dual machinery to understand this problem.

For instance, consider the Lagrange dual problem: $$\begin{array}{lll} \text{Problem 3:} & \text{maximize} & \textstyle g(\alpha) \triangleq \inf_x f(x) + \alpha g(x) - \alpha c \\ & \text{subject to} & \alpha \geq 0 \end{array}$$ Let's call the optimal value of Problem 2 above $p^*$, and the optimal value of Problem 3 $d^*$. It is always the case that $p^*\geq d^*$ (weak duality). Under certain conditions, we actually get $p^*=d^*$ (strong duality). A very common such condition is called Slater's condition: if a problem is strictly feasible, then strong duality is assured. In this case, Slater's condition requires that there exist at least one $x$ for which $g(x)<c$. If that is true, then not only is $p^*=d^*$, but we know that the dual optimal is attained: that is, there is at least one $\alpha$ that achieves the optimal value $d^*$.

So how do we map this to your original question? If $g(x)<c$ for at least one $x$, then we can guarantee the existence of a value of $\alpha$ that results in $g(x)=c$ in Problem 1. And that value of $\alpha$ is precisely the optimal point returned by solving Problem 3. Whether it is possible or practical to solve that problem is another matter.

EDIT: Actually, I just realized there may still be cases where we cannot guarantee $g(x)=c$ for any value of $\alpha$. For instance, suppose that $c$ is so large that the constraint $g(x)\leq c$ is inactive at the solution: that is, $g(\bar{x})<c$ where $\bar{x}=\arg\min_x f(x)$. Then $\alpha=0$ is the dual optimal point, and $g(x^*)\neq c$. I think you'll agree this is a minor issue but it's a real one. So this analysis only holds for values of $c$ where the constraint is active; that is, you need some condition like this: $\inf_x f(x) < \inf_{x:g(x)< c} f(x)$.

Slater's condition is sufficient but not necessary. For instance, if $g(x)$ is polyhedral, we can guarantee that $p^*=d^*$ even if Slater's condition does not hold. But we may not be able to find a specific value of $\alpha$; it may be that $\alpha\rightarrow 0$ as the dual problem approaches the optimal value. But frankly, in practice, most of the time a problem this simple is going to exhibit strong duality.

What about the other direction? That is, what if you know $\alpha$; is there a $c$? I think Ert addressed that sufficiently well.