I am reading about Support Vector Machines and there are some steps that I don't understand regarding convex optimization. I won't get into the specific constraints of SVM's. Our minimization problem is the following:
$$ \text{minimize} \quad f(x) \\ \text{s.t.} \quad g(x) \leq 0$$
The function $f(x)$ is convex and the constraint $g(x)$ is also convex. So we deal with a convex optimization problem. Note that $x \in \mathbb{R}^n$.
This is how I would solve the problem:
- Define the Lagrangian:
$$L = f(x) + \lambda g(x)$$
- Set the derivative with respect to $x$ to 0: $$\nabla L =0$$
- Subject to the constraints: $$\lambda \geq 0 \\ \lambda g(x) = 0 \\ g(x) \leq 0 $$
Problem 1 Since $f(x)$ is convex this means that is also convex on any subset of its domain. This subset is defined by $g(x)$ in our problem. Does this mean that there is a unique solution? In other words, if find a point $x^*$ that satisfies the conditions (2, 3) I am done?
My reasoning is the following: since $f(x)$ is convex in the region defined by the constraint then there is only a unique minimum (assuming that our $f(x)$ is strictly convex). Therefore, there would be only 1 point that would satisfy these constraints, and it would be located either on the boundary $g(x) = 0$ or inside the boundary $g(x) \leq 0$.
Problem 2 One can show that our original problem is equivalent to the following minmax (Primal) problem which is related to a maxmin problem (Dual): $$\underbrace{\underset{x}{\min} \underset{\lambda}{\max}}_{\text{Primal}} L \leq \underbrace{\underset{\lambda}{\max} \underset{x}{\min}}_\text{Dual} L$$
Although I can follow the mathematics of the derivation I can't understand why we want to study the Dual. Is there any computational advantage for example?
Problem 3 The first step when solving the Dual problem is to minimize the Lagrangian. We do this by setting the gradient with respect to $x$ to 0. I can't understand why this is sufficient. Is it because the Lagrangian is also convex since it is a sum of convex functions ($f(x)$ and $g(x)$ are both convex), so the only points that would satisfy this conditions are the minimums?
$f(x)$ will be convex on a subset of the domain if the subset is convex. The solution does not need to be unique though. The minima of $f(x) = |x-1| + |x+1|$ is the entire interval $[-1,1]$.
There are a bunch of reasons to study the dual but I'll just give a couple. One is that it often has very satisfying interpretations. The lagrange multipliers can end up being interpreted as shadow prices (economics), contact forces (classical mechanics), implied probabilities (economics/finance).
Another reason to care about the dual is that it's very useful in numerical algorithms. The solution to the dual problem is a lower bound on the primal problem. Many algorithms actually work by solving the primal and dual problem together (augmented Lagrangian, alternating direction method of multipliers).
This is part of the Karush-Kuhn-Tucker conditions (KKT). When we create the Lagrangian $L(x,\lambda)=f(x)+\lambda g(x)$, for any feasible $x_{feas}$, it's guaranteed that $L(x_{feas},\lambda)\le f(x_{feas})$. But we'd like to find a lower bound that doesn't depend on $x$ at all. So, for each $\lambda$, we find the value of $x$ that minimises the Lagrangian. This gives us a new function $g(\lambda)$ which satisfies $g(\lambda) \le L(x,\lambda) \le f(x)$.
The partial minimisation over $x$ is why we set the gradient to 0. Just pretend $\lambda = 42$. How would you minimise $L(x, 42)$? Now do this for every value of $\lambda$.