Necessary and Sufficient Conditions for Lagrange Optimisation!

1.6k Views Asked by At

I'd be immensely grateful if someone could spell out in black and white:

  1. Which conditions are necessary and sufficient, for Lagrange optimisation?
  2. Do necessary conditions become sufficient conditions?
  3. How can an individual condition be labelled "sufficient," if it relies on other necessary conditions being satisfied first. Surely all conditions are necessary, until they are all satisfied, at which point, collectively they are sufficient?
  4. I'm unclear which of these conditions are meant to be necessary/sufficient condition to show a global maximum exists in general, vs necessary/sufficient condition for legitimising the use of the Legahangian specifically to show a global maximum exists?
  • It's the Necessary/Sufficiency element that is most bugging me, not generally the actual meaning of the condition.

I know it's important to show effort here, so this is my current understanding:

Underlying logic: $P \implies Q$

  • Necessary: $Q$ is necessary for $P$, i.e. $P$ cannot happen without $Q$, but $Q$ happening doesn't mean $P$ has happened. E.g. Death is necessary for murder.
  • Sufficient: $P$ is Sufficient for $Q$ i.e.$P$ happening, guarantees $Q$ happening. E.g. Murder is sufficient for Death.

First Order Necessary Conditions (My best guess)

  1. The feasible set of possible solution vectors is convex.
  2. The objective function is concave (assume we are maximising)
  3. The constraint functions are convex.
  4. The constraint and objective functions are differentiable.
  5. There exists an $x^*$ that makes the gradient of our Lagrangian = 0
  • (This final point seems to get lumped into different categories depending on where I read. For example unlike unconstrained optimisation, in constrained optimisation a gradient of 0 is not sufficient for a global maximum ( timestamped source). But at the same time, I've seen it lumped together with sufficient conditions. I assume because it becomes sufficient once other sufficient conditions are satisfied, but that feels like circular reasoning.

First order Sufficient Conditions (KKT)

  1. Point 5 fro above (Lagrange gradient is $0$) is often included here as well. Confusing for the reasons mentioned above.
  2. The Lagrange multiplier $λ \ge 0$
  3. Optimal Vector $χ^*$ is feasible (i.e. satisfies the constraint)
  4. Complementary Slackness is satisfied.

Second Order Necessary Conditions for Lagrange optimisation

I'm assuming these conditions are here to tell us whether or not the maximum we have found is indeed a global maximum.

  1. All the previous necessary and sufficient conditions.
  2. The function is twice differentiable. (To get our partials for our hessian).
  3. Both partial derivatives are continuous. (I'm not sure why we need the second partial derivative to be continuous).

Second Order Sufficient Conditions

  1. For a global maximiser our function is concave i.e. Hessian is Negative Semi-definite
  2. For a strict global maximiser our function is Strictly Concave i.e. Hessian is Negative Definite.

I'm assuming Global Maximum, Vs Strict Global Maximum is whether the function has a plateau, or multiple peaks of the same height i.e. a multitude of vectors could give the same global maximum?

Questions Recap

  1. Clarifying which are first order necessary/sufficient conditions?
  2. What is necessary/Sufficient in general, vs for allowing use of the Lagrangian?
  3. Explaining the fact that, individually none of the conditions seem to be Sufficient. So why don't we just have every condition labelled as necessary, until they are all satisfied, in which case they all become collectively sufficient?
  4. Any other conditions/points you would like to clarify.

I really can't find this anywhere so if someone can spell it out, they will be doing Econ/Math grads everywhere a huge favour.

3

There are 3 best solutions below

6
On BEST ANSWER

Throughout this explanation, we assume our optimization problem to be of minimization type.

According to Boyd's convex optimization, the Lagrangian dual problem always provides a lower bound to the primal optimal value. More precisely, let a general (not necessarily convex) optimization problem be defined in the following standard form $$ {\min_x f(x) \\ \text{subject to} \\ f_i(x)\le 0\quad,\quad i=1,\cdots,n \\ h_j(x)= 0\quad,\quad j=1,\cdots,k }. $$ The Lagrangian of this problem is given by $$ L(x,\lambda,\nu)=f(x)+\sum_{i=1}^n\lambda_if_i(x)+\sum_{j=1}^k\nu_jh_j(x). $$ The dual objective function is defined as $g(\lambda,\nu)=\inf_x L(x,\lambda,\nu)$, which is a concave function w.r.t. $\lambda$ and $\nu$ (even though our primal optimization problem may not be convex). Hence, we can achieve the dual problem as $$ {\max_{\lambda,\nu} g(\lambda,\nu) \\\text{subject to} \\\lambda_i\ge 0\quad,\quad i=1,\cdots,n }. $$ Let us denote the primal and dual optimal values by $p^*$ and $d^*$. Then we always have $$ d^*\le p^*. $$ This condition is called weak duality.

Note that the KKT conditions for a general primal optimization problem are necessary. This means that an optimal primal-dual minimizer $(x^*,\lambda^*,\nu^*)$ tuple, always fulfills these conditions.

If the primal problem is convex, the KKT conditions are also sufficient, in the sense that if a primal-dual $(\hat x,\hat \lambda,\hat \nu)$ tuple satisfies the KKT conditions, then $\hat x$ and $(\hat \lambda,\hat \nu)$ will be the primal and dual optimal, respectively and $d^*=p^*$.

The important tip in all of these explanations is that all of the KKT conditions are necessary. For a minimizer $(x^*,\lambda^*,\nu^*)$, all the five KKT conditions are deduced.

On the other hand, for a convex problem, the KKT conditions are sufficient altogether. Fulfilling the KKT conditions partially (such that for example, some of the five conditions hold and the others do not), does not guaranty the optimality of a tuple $(x,\lambda,\nu)$.

The necessity of the KKT conditions at a point implies which points cannot be minimizers. In other words, solving for the KKT conditions in any optimization problem gives you a superset of candidate points, some of which "truly" being minimizers. Note the some of which here. At this point, no further information is given to you, unless through using the second-order necessary and sufficient conditions, to which we will return later.

The sufficiency of the KKT conditions at a point validates the point as a minimizer. For a convex optimization problem, it implies that solving for the KKT conditions in convex optimization problems gives you a set of all the minimizers, nothing more, nothing less.

The KKT conditions are also referred to as First-Order Necessary Conditions (FONC), since they must hold for any minimizer to an optimization problem and only require up to the 1st-order derivative of the objective function and the constraints to exist.

Marking where we left off, we pointed that in a non-convex optimization problem, where KKT conditions are necessary, but not not sufficient, we obtain a set of candidate points. The Second-Order Necessary Conditions (SONC) enforce that some of these candidates cannot be minimizers. On the other hand, the Second-Order sufficient Conditions (SOSC) enforce that some of these candidates are undoubtedly minimizers.

Conclusion

In a convex optimization problem, you can always solve for the KKT conditions (FONC) to achieve a set of minimizer candidates and be sure that all of them are your minimizers.

In a non-convex optimization problem, you can always solve for the KKT conditions (FONC) to achieve a set of minimizer candidates. You know that the true minimizers are among these candidates. What you do not know, is which ones.

After obtaining such candidates set, you can further refine it through applying the SONC. Still, you do not know which ones of your candidates are minimizers, but now, you are dealing with a (probably) smaller set of candidates.

At this point, you can apply SOSC to single out some minimizers out of your shrunk set of candidates, but (probably) there will yet be a subset of candidates whose fate would remain unveiled to us.

Example

Suppose the following optimization problem: $$ {\min 3x_1^4-8x_1^3+6x_1^2+x_2^2 \\\text{subject to} \\ x_1^2+x_2^2\le 2 }. $$ The Lagrangian is $$ L(x,\lambda)=3x_1^4-8x_1^3+6x_1^2+x_2^2+\lambda(x_1^2+x_2^2-2). $$ Applying the KKT conditions yields $$ \nabla_x L(x,\lambda)=0\implies \begin{cases} 12x_1(x_1-1)^2+2x_1\lambda=0\\ 2x_2(1+\lambda)=0 \end{cases}, $$ from which, by applying the other KKT conditions we get to two candidates $(x_1,x_2,\lambda)=(0,0,0)$ and $(x_1,x_2,\lambda)=(1,0,0)$.

The hessian of the objective function is $$ H(x_1,x_2)=\begin{bmatrix} 12(3x_1-1)(x_1-1)&0\\ 0&2 \end{bmatrix}, $$ which is positive semi-definite at both $(x_1,x_2)=(0,0)$ and $(x_1,x_2)=(1,0)$. So, both of them pass the SONC.

Since $H(0,0)\succ0$, then $(0,0)$ is a minimizer by SOSC. Unfortunately, the condition of $(1,0)$ will remain mysterious so far. However, by checking the higher-order derivatives, we see that $(1,0)$ is a saddle point and hence, cannot be a minimizer.

2
On

This is my understanding so far combing the great answer from @MostafaAyaz, and more sources on the internet!

${\max f(x) \\ \text{subject to} \\ g_i(x)\le 0\quad,\quad i=1,\cdots,n \\ h_j(x)= 0\quad,\quad j=1,\cdots,k }.$

Initial Assumptions: (Necessary?)

  1. Assume our Objective Function $f(x)$ And our constraints $g(x)$ and $h(x)$ are differentiable.
  2. Assume that $f(x)$ Is also defined over a open subset U of $\mathbb{R^n}$
  3. Assume that this subset U is Convex.

Why these Initial Assumptions? To use our Lagrangian Method to solve the COP i believe these assumptions must be made.

  1. So that we can find stationary points, or more likely, tangency conditions between the objective function and the constraints.
  2. Because looking at the definition of the $Df$ map, you need to have a 'neighborhood' around a point to know how the function changes in all directions near that.
  3. The feasible set should be convex. I'm struggling to put into words why. By considering that derivatives are linear approximation, it seems important that we can understand points in terms of their nearby points connected by a line. I.e. hence convexity is required?

KKT (Lagrangian) Necessary conditions

These conditions will allow us to find candidate solutions $x^*$ . The conditions are always necessary but not always sufficient for the candidate solutions to be the optimal solution to the COP.

  1. Their exists vectors λ  and μ, such that partial derivatives of the Lagrangian evaluated at $x^* = 0$ i.e. $\begin{equation*} \begin{array}{c} \nabla f(\boldsymbol{x}) + \sum^{p}_{i=1} \mu_i \nabla g_i (\boldsymbol{x}) + \sum^{q}_{j=1} \lambda_j \nabla h_j (\boldsymbol{x}) = \boldsymbol{0}\end{array} \end{equation*}$
  2. λ,μ $\succeq 0$
  3. $x^*$ is feasible i.e. it satisfies our constraints.
  4. Complimentary Slackness holds.

When do KKT conditions become sufficient

  1. If you constraint is compact, then the KKT conditions will give us a solution to our COP.
  2. If our constraint is not compact then we require two more necessary conditions:
  • The Objective functions is Concave
  • The Constraint function is Convex.

The idea here i believe that if the constraint is say unbounded below, but closed above, then a concave function can find a maximum on the boundary of the constraint (see here).

If either or both of these conditions are satisfied, then our KKT conditions will be sufficient for a solution to the constrained optimization problem.

Question: I came across some confusion here. For a maximization problem some authors say that the constraint needs to be concave, others say convex. This is made even more confusing by the fact that Linear constraints are both convex and concave.

I believe where the confusion is coming from is how the constraint and subsequent Lagrangian is defined. But the key principle is that the Lagrangian must be concave, and so we must have a sum of concave functions:

Version 1 (Course Pack): $\mathcal{L}(k ,λ ,x) = f(x) + λ [k - g(x)]$

  • Here we are told h(x) the constraint, must be convex, I believe this is because the Lagrangian includes $- h(x)$ which I assume is concave:

Version 2 (Youtube Lecture): $\mathcal{L(\boldsymbol{x})} = f(x) + λ g(x)$

  • Here we are told $g(x)$ is Concave, i assume because the Lagrangian is formulated in a way where the constraint function is added. (I believe here in both examples, $g(x)$ is being used broadly to capture the inequality and equality constraints...I.e it's meant to be vector notation).

I will revisit this later for the Second Order conditions, if people could comment on what is or isn't right that'd be great!

0
On

Comments on the @CormJack's explanations

Let us first discuss the initial assumptions made to the COP, enabling us to make use of the FONC, SONC and SOSC (I have covered these conditions in my original answer to sufficient extent).

1- (Necessary) The differentiability of $f(x)$, $g_i(x)$ and $h_j(x)$ is required for KKT (FONC). Furthermore, if SONC and SOSC are to be examined, the 2nd-order derivative needs to exist.

2- (Not necessary) It is the feasible region that should be closed. Outside the feasible region, any behavior of the objective function and the constraint functions is tolerable. Their derivatives then need to hold only within the feasible region, not necessarily elsewhere.

3- The existence of partial derivatives of a function at the neighborhood of a point, requires the openness of the feasible region, not convexity. For example, $f(x)=x_1^2+x_2^2$ is differentiable at any point of $D=\{(x_1,x_2)\in\Bbb R^2: x_1^2+x_2^2>1\}$, whilst $D$ is not convex.

There is a catch about closed sets and their boundary points. Although the derivative of a function must still exist at any boundary point, the FONC, SONC and SOSC hold differently for a boundary and an interior point. For example, under FONC for an unconstrained optimization problem, we must have $\nabla f(x_i)=0$ for an interior point $x_i$ and $v^T\nabla f(x_b)\ge 0$ for any feasible direction $v$ at the boundary point $x_b$ (a feasible direction $v$ at point $p\in D$ is any vector such that there exists an $\epsilon>0$ where $p+\epsilon v\in D$). Such differences exist for SONC and SOSC too. For example, the condition $\nabla^2 f(x)\succeq 0$ for an interior point $x$ is altered to $v^T\cdot \nabla^2 f(x)\cdot v\ge 0$ for a boundary point $x$ and any feasible direction $v$ at $x$.

KKT (Lagrangian) Necessary conditions

Your comprehension about the necessity of the KKT conditions is correct, except for saying $\lambda,\mu\ge 0$. If it means $\lambda\succeq 0$ and $\mu\succeq 0$, note that only $\mu$ needs to be elementwise non-negative, since $\lambda$ denotes the inequality constraints coefficients.

When do KKT conditions become sufficient

Honestly, I could not fully understand your mention of compact notion, since it is only defined within the context of general topology. In fact, a set $D$ is defined compact, if any sequence $\{a_n\}_{n=1}^\infty\subseteq D$ has a subsequence $\{a_{b_n}\}_{n=1}^\infty\subseteq D$ convergent to a member of $D$. Nevertheless, I feel the linear optimization problems may be a contradiction for this part.

Now, let us draw our attention to your question.

Comments on your question

Consider the following canonical form of general optimization problems: $$ {\min f(x) \\\text{s. t.} \\g_i(x)\le 0 \\h_j(x)= 0 }. $$ There is no restriction on the types of the functions $f(x)$, $g_i(x)$ and $h_j(x)$. You can optimize anything under any constraints, but if you wish to formulate our optimization problem as a convex one (so that you can benefit from the convexity), you must have $f(x)$ and $g_i(x)$ to be convex, and $h_j(x)$ to be affine. Note that the problem is defined as minimization-type. Any maximization-type problem has obviously an equivalent minimization-type problem through $\max_{x\in D} f(x)=-\min_{x\in D} -f(x)$. Since the convexity of $f(x)$ is equivalent to the concavity of $-f(x)$, a maximization-type COP has an equivalent convex form, if its objective function is concave.

To remove your ambiguity about the linear constraints...

Think of equality constraints $h_j(x)=0$ as two inequality constraints as follows: $$ h_j(x)\le 0\text{ and }-h_j(x)\le 0. $$ According to our deduction about convex COPs, we assumed that all the inequality constraint functions must be convex. This also applies to $h_j(x)$ and $-h_j(x)$, which means that both $h_j(x)$ and $-h_j(x)$ must be convex. The only convex function with convex negative is the affine function.