Please explain the intuition behind the dual problem in optimization.

71.3k Views Asked by At

I've studied convex optimization pretty carefully, but don't feel that I have yet "grokked" the dual problem. Here are some questions I would like to understand more deeply/clearly/simply:

  1. How would somebody think of the dual problem? What thought process would lead someone to consider the dual problem and to recognize that it's valuable/interesting?
  2. In the case of a convex optimization problem, is there any obvious reason to expect that strong duality should (usually) hold?
  3. It often happens that the dual of the dual problem is the primal problem. However, this seems like a complete surprise to me. Is there any intuitive reason to expect that this should happen?
  4. Does the use of the word "dual" or "duality" in optimization have anything to do with the dual space in linear algebra? Or are they just different concepts that go by the same name. What about the use of the word "dual" in projective geometry — is there a connection there?
  5. You can define the dual problem and prove theorems about strong duality without ever mentioning the Fenchel conjugate. For example, Boyd and Vandenberghe prove a strong duality theorem without mentioning the Fenchel conjugate in their proof. And yet, people often talk as if the Fenchel conjugate is somehow the "essence" of duality, and make it sound as if the whole theory of duality is based on the Fenchel conjugate. Why is the Fenchel conjugate considered to have such fundamental importance?

Note: I will now describe my current level of understanding of the intuition behind the dual problem. Please tell me if you think I might be missing any basic insights.

I have read the excellent notes about convex optimization by Guilherme Freitas, and in particular the part about "penalty intuition". When we are trying to solve

\begin{align*} \text{minimize} &\quad f(x) \\ \text{such that} & \quad h(x) \leq 0 \end{align*}

one might try to eliminate the constraints by introducing a penalty when constraints are violated. This gives us the new unconstrained problem

\begin{equation} \text{minimize} \quad f(x) + \langle \lambda ,h(x) \rangle \end{equation}

where $\lambda \geq 0$. It's not hard to see that for a given $\lambda \geq 0$, the optimal value of this unconstrained problem is less than or equal to the optimal value for the constrained problem. This gives us a new problem — find $\lambda$ so that the optimal value for the unconstrained problem is as large as possible. That is one way to imagine how somebody might have thought of the dual problem. Is this the best intuition for where the dual problem comes from?

Another viewpoint: the KKT conditions can be derived using what Freitas calls the "geometric intuition". Then, if we knew the value of the multipliers $\lambda$, it would be (often) much easier to find $x$. So, a new problem is to find $\lambda$. And if we can somehow recognize that $\lambda$ is a maximizer for the dual problem, then this suggests that we might try solving the dual problem.

Please explain or give references to any intuition that you think I might find interesting, even if it's not directly related to what I asked.

3

There are 3 best solutions below

0
On
6
On

I'll take a crack at a couple of these questions (some of them are hard and would require more thought).

1) Here's a nice economic interpretation of duality (being a bit "fast and loose" with the details). Let's say $x$ represents the design of a widget, and $f(x)$ is the cost you will incur producing it. $x$ must satisfy "design constraints" given by $h(x)\leq 0$ (suppose for simplicity that we have only 1 constraint function). Out of curiosity, you decide to try farming out production: another company agrees to produce your thingy $x$ and "charge" you $f(x)$ for it. Their goal is ultimately to maximize profit, but they can't charge you more than you would spend doing it yourself. So given a fixed $\lambda$, they need to find the design $x$ that minimizes $f(x)+\lambda h(x)$. If they don't, you'll be able to find a feasible design $y$ that has a lower cost $f(y)<f(x)$, and thus you'll make the widget yourself. Now that this company can do at least as good as you, they'll set about maximizing their profit by varying $\lambda$ (maybe $\lambda$ corresponds to a different process or something).

This interpretation is similar to that in Boyd and Vandenberghe, 5.4.4. Also I find 'game theoretic' interpretations helpful - the primal problem is your strategy while the dual corresponds to an opponent's strategy.

2) I don't have an intuitive answer for this other than to say that strong duality is equivalent to existence of a saddle point of the Lagrangian.

3) For convex problems this makes sense because applying the convex conjugate (Fenchel transform) twice returns the 'convexification' of the original objective function, which is the same as the original function in most 'nice' situations.

4) Yes, but you definitely have to be careful here. The dual of a vector space is formally defined as the space of all continuous linear functionals on that space, and this concept lives 100% independently of optimization theory. However, you're correct to notice that the dual of a vector space does arise in the statement of the dual of an optimization problem. Let me explain: define $X=\mathbb{R}^n$ and $Y=\mathbb{R}^m$ and consider the problem:

$$ \text{minimize}\quad f(x)\\ \text{such that} \quad h(x)\leq 0\\ x\in X,\;f:X\rightarrow \mathbb{R},\\ h:X\rightarrow Y. $$ Then, this problem has the following dual:

$$ \max_\lambda\quad \inf_{x\in X} \{f(x)+\langle \lambda,h(x)\rangle\}\\ \text{such that}\quad \lambda_i\geq 0,~~\forall i: 0\leq i\leq m $$ Now, "formally", $\lambda$ is an element of the dual space of $Y$, since we're considering an inner product of the form $\langle\lambda, y\rangle$ where $y\in Y$, and hence can think of this as a continuous linear functional on $Y$. But here $Y$ is finite dimensional, so (by the Riesz representation theorem) $Y^*$ is actually isomorphic to $Y$, so the distinction isn't really necessary and you haven't gained anything by thinking of $\lambda$ as being an element of the dual space. It's possible that in infinite dimensional problems where $Y^*$ is not isomorphic to $Y$ you get something out of this, but I can't think of a good example.

To my knowledge, there is zero connection between duality in projective geometry and duality in optimization. Duality in projective geometry is more a statement about the bijection between points and rays that defines projective space. "Duality" in math really just means having 2 ways to think about a problem - it's as overused as words like "fundamental" and "canonical". Another classical example of duality comes from fluid dynamics and PDE - "Eulerian" coordinates vs. 'Lagrangian' coordinates.

5) For convex problems, I would agree that the convex conjugate is key, though in general I'm not sure it helps a lot with the 'general' idea of duality since (a) not all interesting problems are convex and (b) the convex conjugate is legendarily hard to gain intuition for. Historically this 'fundamental importance' of the convex conjugate might be traced to physics, where it turns out that a lot of interesting physical systems have a convex 'Lagrangian' describing their total energy. The equations of motion can then be phrased essentially as a convex minimization problem with respect to this Lagrangian. The convex conjugate function, it then turns out, is what is called the 'Hamiltonian', which leads to an entirely different (one might say 'dual') formulation of the equations of physics. Thus we have yet again two ways to approach the same problem!

Hope this was somewhat useful to you.

1
On

Here are some counter-examples to help you understand KKT conditions and strong duality. The answer is from my other post: https://math.stackexchange.com/a/4154563/273731

${\bf counter-example 1}$ If one drops the convexity condition on objective function, then strong duality could fails even with relative interior condition.

The counter-example is the same as the following one.

${\bf counter-example 2}$ For non-convex problem where strong duality does not hold, primal-dual optimal pairs may not satisfy KKT condition.

Consider the optimization problem \begin{align} \operatorname{minimize} & \quad e^{-x_1x_2} \\ \text{subject to} & \quad x_1\le 0. \end{align} The domain for the problem is $D = \{ (x_1,x_2) \ge 0 \}$. The problem is not convex by calculating the Hessian matrix. Clearly, any $x_1 = 0, x_2 \in\mathbb R_+$ is a primal optimal solution with primal optimal value $1$ .

The Lagrangian is $$ L(x_1,x_2,\lambda) = e^{-x_1x_2} + \lambda x_1. $$ The dual function is \begin{align} G(\lambda) &= \inf L(x_1,x_2,\lambda) = \begin{cases} 0& \lambda\ge 0\\ -\infty& \lambda < 0 \end{cases} \end{align}

Thus, $\lambda \geq 0$ is dual optimal solution with dual optimal value $0$, so dual gap is $1$, strong duality fails. As for the KKT conditions, remember the domain is $D = \{ (x_1,x_2) \ge 0 \}$

\begin{align*} &\lambda-x_2e^{-x_1x_2}=0\\ &x_1\le 0\\ &\lambda\ge 0\\ &\lambda x_1=0\\ \end{align*}

Pick any primal-dual pair satisfying $x_1 = 0, x_2\ge 0, \lambda\ge0, \lambda\ne x_2$, the KKT conditions fail.

${\bf counter-example 3}$ For a non-convex problem, even strong duality holds, solutions for KKT conditions may not be primal-dual optimal solution.

Consider the optimization problem on $\mathbb R$ \begin{align} \operatorname{minimize} & \quad x^3 \\ \text{subject to} & -x^3-1\le 0. \end{align} The objective function is not convex by calculating the Hessian matrix. Clearly, $x=-1$ is the unique primal optimal solution with primal optimal value $-1$.

The Lagrangian is $$ L(x,\lambda) = x^3 - \lambda (x^3+1). $$

The dual function is \begin{align} G(\lambda) &= \inf L(x,\lambda) = \begin{cases} -1& \lambda=1\\ -\infty& otherwise \end{cases} \end{align}

Thus, $\lambda = 1 $ is dual optimal solution with dual optimal value $-1$, so dual gap is $0$, strong duality holds. While the KKT conditions are

\begin{align*} &3x^2(1-\lambda)=0\\ &-x^3-1\le 0\\ &\lambda\ge 0\\ &\lambda (-x^3-1)=0\\ \end{align*}

Solutions for KKT conditions are $x=-1, \lambda=1$ or $x=0,\lambda=0$. Notice that $x=0,\lambda=0$ satisfies KKT conditions but has nothing to do with primal-dual optimal solutions.

The discussion indicates for non-convex problem, KKT conditions may be neither necessary nor sufficient conditions for primal-dual optimal solutions.

${\bf counter-example4}$ For a convex problem, even strong duality holds, there could be no solution for the KKT condition, thus no solution for Lagrangian multipliers.

Consider the optimization problem on domain $\mathbb R$ \begin{align} \operatorname{minimize} & \quad x \\ \text{subject to} & \quad x^2\le 0. \end{align}

Obviously, the problem is convex with unique primal optimal solution $x=0$ and optimal value $0$; Feasible set is $\{0\}$, therefore Slater's condition fails.

The Lagrangian is $$ L(x,\lambda) = x + \lambda x^2. $$

The dual function is \begin{align} G(\lambda) &= \inf L(x,\lambda) = \begin{cases} -\infty& \lambda\le 0\\ -\frac{1}{4\lambda} &\lambda >0 \end{cases} \end{align}

Thus, dual optimal value is $0$, so dual gap is $0$, strong duality holds. However, there are no solution for dual optimal solution because the optimal value is attained as $\lambda\rightarrow \infty$. As for the KKT conditions

\begin{align*} &1+2\lambda x=0\\ &x^2\le 0\\ &\lambda\ge 0\\ &\lambda x^2=0\\ \end{align*} No solution for KKT conditions.

This is the convex problem where the dual problem has no feasible solution and KKT conditions have no solution but the primal problem is simple to solve.

${\bf counter-example 5}$ For a differentiable convex problem, there could be no solution for the KKT conditions, even if the primal-dual pair exists. In that case, the strong duality fails.

Consider the optimization problem on domain $D:=\{(x,y):y>0\}$ \begin{align} \operatorname{minimize} & \quad e^{-x} \\ \text{subject to} & \quad \frac{x^2}{y}\le 0. \end{align}

The problem can be proved to be convex with primal optimal solution $x=0, y>0$ and optimal value $1$; Feasible set is $\{(0,y): y > 0\}$, therefore Slater's condition fails.

The Lagrangian is $$ L(x,y,\lambda) = e^{-x} + \lambda \frac{x^2}{y}. $$

After some careful calculation, the dual function is \begin{align} G(\lambda) &= \inf L(x,y,\lambda) = \begin{cases} 0& \lambda\ge 0\\ -\infty &\lambda <0 \end{cases} \end{align}

Thus, dual optimal value is $0$, so dual gap is $1$, strong duality fails. We can pick primal-dual pair to be $x=0, y=1, \lambda=2$. As for the KKT conditions

\begin{align*} &-e^{-x}+\frac{2\lambda x}{y}=0\\ &\frac{x^2}{y}\le0\\ &\lambda\ge 0\\ &\lambda \frac{x^2}{y}=0\\ \end{align*} with $y>0$, thus no solution for KKT conditions.

This counter-example warns us that we have to be careful about the strong duality condition even for differentiable convex problems.