I'd be immensely grateful if someone could spell out in black and white:
- Which conditions are necessary and sufficient, for Lagrange optimisation?
- Do necessary conditions become sufficient conditions?
- How can an individual condition be labelled "sufficient," if it relies on other necessary conditions being satisfied first. Surely all conditions are necessary, until they are all satisfied, at which point, collectively they are sufficient?
- I'm unclear which of these conditions are meant to be necessary/sufficient condition to show a global maximum exists in general, vs necessary/sufficient condition for legitimising the use of the Legahangian specifically to show a global maximum exists?
- It's the Necessary/Sufficiency element that is most bugging me, not generally the actual meaning of the condition.
I know it's important to show effort here, so this is my current understanding:
Underlying logic: $P \implies Q$
- Necessary: $Q$ is necessary for $P$, i.e. $P$ cannot happen without $Q$, but $Q$ happening doesn't mean $P$ has happened. E.g. Death is necessary for murder.
- Sufficient: $P$ is Sufficient for $Q$ i.e.$P$ happening, guarantees $Q$ happening. E.g. Murder is sufficient for Death.
First Order Necessary Conditions (My best guess)
- The feasible set of possible solution vectors is convex.
- The objective function is concave (assume we are maximising)
- The constraint functions are convex.
- The constraint and objective functions are differentiable.
- There exists an $x^*$ that makes the gradient of our Lagrangian = 0
- (This final point seems to get lumped into different categories depending on where I read. For example unlike unconstrained optimisation, in constrained optimisation a gradient of 0 is not sufficient for a global maximum ( timestamped source). But at the same time, I've seen it lumped together with sufficient conditions. I assume because it becomes sufficient once other sufficient conditions are satisfied, but that feels like circular reasoning.
First order Sufficient Conditions (KKT)
- Point 5 fro above (Lagrange gradient is $0$) is often included here as well. Confusing for the reasons mentioned above.
- The Lagrange multiplier $λ \ge 0$
- Optimal Vector $χ^*$ is feasible (i.e. satisfies the constraint)
- Complementary Slackness is satisfied.
Second Order Necessary Conditions for Lagrange optimisation
I'm assuming these conditions are here to tell us whether or not the maximum we have found is indeed a global maximum.
- All the previous necessary and sufficient conditions.
- The function is twice differentiable. (To get our partials for our hessian).
- Both partial derivatives are continuous. (I'm not sure why we need the second partial derivative to be continuous).
Second Order Sufficient Conditions
- For a global maximiser our function is concave i.e. Hessian is Negative Semi-definite
- For a strict global maximiser our function is Strictly Concave i.e. Hessian is Negative Definite.
I'm assuming Global Maximum, Vs Strict Global Maximum is whether the function has a plateau, or multiple peaks of the same height i.e. a multitude of vectors could give the same global maximum?
Questions Recap
- Clarifying which are first order necessary/sufficient conditions?
- What is necessary/Sufficient in general, vs for allowing use of the Lagrangian?
- Explaining the fact that, individually none of the conditions seem to be Sufficient. So why don't we just have every condition labelled as necessary, until they are all satisfied, in which case they all become collectively sufficient?
- Any other conditions/points you would like to clarify.
I really can't find this anywhere so if someone can spell it out, they will be doing Econ/Math grads everywhere a huge favour.
Throughout this explanation, we assume our optimization problem to be of minimization type.
According to Boyd's convex optimization, the Lagrangian dual problem always provides a lower bound to the primal optimal value. More precisely, let a general (not necessarily convex) optimization problem be defined in the following standard form $$ {\min_x f(x) \\ \text{subject to} \\ f_i(x)\le 0\quad,\quad i=1,\cdots,n \\ h_j(x)= 0\quad,\quad j=1,\cdots,k }. $$ The Lagrangian of this problem is given by $$ L(x,\lambda,\nu)=f(x)+\sum_{i=1}^n\lambda_if_i(x)+\sum_{j=1}^k\nu_jh_j(x). $$ The dual objective function is defined as $g(\lambda,\nu)=\inf_x L(x,\lambda,\nu)$, which is a concave function w.r.t. $\lambda$ and $\nu$ (even though our primal optimization problem may not be convex). Hence, we can achieve the dual problem as $$ {\max_{\lambda,\nu} g(\lambda,\nu) \\\text{subject to} \\\lambda_i\ge 0\quad,\quad i=1,\cdots,n }. $$ Let us denote the primal and dual optimal values by $p^*$ and $d^*$. Then we always have $$ d^*\le p^*. $$ This condition is called weak duality.
Note that the KKT conditions for a general primal optimization problem are necessary. This means that an optimal primal-dual minimizer $(x^*,\lambda^*,\nu^*)$ tuple, always fulfills these conditions.
If the primal problem is convex, the KKT conditions are also sufficient, in the sense that if a primal-dual $(\hat x,\hat \lambda,\hat \nu)$ tuple satisfies the KKT conditions, then $\hat x$ and $(\hat \lambda,\hat \nu)$ will be the primal and dual optimal, respectively and $d^*=p^*$.
The important tip in all of these explanations is that all of the KKT conditions are necessary. For a minimizer $(x^*,\lambda^*,\nu^*)$, all the five KKT conditions are deduced.
On the other hand, for a convex problem, the KKT conditions are sufficient altogether. Fulfilling the KKT conditions partially (such that for example, some of the five conditions hold and the others do not), does not guaranty the optimality of a tuple $(x,\lambda,\nu)$.
The necessity of the KKT conditions at a point implies which points cannot be minimizers. In other words, solving for the KKT conditions in any optimization problem gives you a superset of candidate points, some of which "truly" being minimizers. Note the some of which here. At this point, no further information is given to you, unless through using the second-order necessary and sufficient conditions, to which we will return later.
The sufficiency of the KKT conditions at a point validates the point as a minimizer. For a convex optimization problem, it implies that solving for the KKT conditions in convex optimization problems gives you a set of all the minimizers, nothing more, nothing less.
The KKT conditions are also referred to as First-Order Necessary Conditions (FONC), since they must hold for any minimizer to an optimization problem and only require up to the 1st-order derivative of the objective function and the constraints to exist.
Marking where we left off, we pointed that in a non-convex optimization problem, where KKT conditions are necessary, but not not sufficient, we obtain a set of candidate points. The Second-Order Necessary Conditions (SONC) enforce that some of these candidates cannot be minimizers. On the other hand, the Second-Order sufficient Conditions (SOSC) enforce that some of these candidates are undoubtedly minimizers.
Conclusion
In a convex optimization problem, you can always solve for the KKT conditions (FONC) to achieve a set of minimizer candidates and be sure that all of them are your minimizers.
In a non-convex optimization problem, you can always solve for the KKT conditions (FONC) to achieve a set of minimizer candidates. You know that the true minimizers are among these candidates. What you do not know, is which ones.
After obtaining such candidates set, you can further refine it through applying the SONC. Still, you do not know which ones of your candidates are minimizers, but now, you are dealing with a (probably) smaller set of candidates.
At this point, you can apply SOSC to single out some minimizers out of your shrunk set of candidates, but (probably) there will yet be a subset of candidates whose fate would remain unveiled to us.
Example
Suppose the following optimization problem: $$ {\min 3x_1^4-8x_1^3+6x_1^2+x_2^2 \\\text{subject to} \\ x_1^2+x_2^2\le 2 }. $$ The Lagrangian is $$ L(x,\lambda)=3x_1^4-8x_1^3+6x_1^2+x_2^2+\lambda(x_1^2+x_2^2-2). $$ Applying the KKT conditions yields $$ \nabla_x L(x,\lambda)=0\implies \begin{cases} 12x_1(x_1-1)^2+2x_1\lambda=0\\ 2x_2(1+\lambda)=0 \end{cases}, $$ from which, by applying the other KKT conditions we get to two candidates $(x_1,x_2,\lambda)=(0,0,0)$ and $(x_1,x_2,\lambda)=(1,0,0)$.
The hessian of the objective function is $$ H(x_1,x_2)=\begin{bmatrix} 12(3x_1-1)(x_1-1)&0\\ 0&2 \end{bmatrix}, $$ which is positive semi-definite at both $(x_1,x_2)=(0,0)$ and $(x_1,x_2)=(1,0)$. So, both of them pass the SONC.
Since $H(0,0)\succ0$, then $(0,0)$ is a minimizer by SOSC. Unfortunately, the condition of $(1,0)$ will remain mysterious so far. However, by checking the higher-order derivatives, we see that $(1,0)$ is a saddle point and hence, cannot be a minimizer.