Self study for exam in convex analysis regarding optimality conditions

112 Views Asked by At

I am working for my oral exam. Here is a question that you might want to help me with.

"Define a constrained minimization problem for a smooth (convex) function with smooth constraints. Provide a characterization for the gradient of the function at the minimum and discuss how the condition for un-constrained problems follows as a special case."

I do not understand what my teacher wants. In the first question, do he want me to give a concrete example of constrained minimization problem for a smooth (convex) function with smooth constraints or just the general theory regarding constrained minimization problem for a smooth (convex) function with smooth constraints? What do he mean with "Provide a characterization for the gradient of the function at the minimum"?

I hope someone can help me to get a better understanding. Thanks in advance.

2

There are 2 best solutions below

0
On BEST ANSWER

I think that they want you to give an abstract example, like the following. Let $f$ and $g$ be differentiable functions and consider

$$\min\limits_{x:g(x)\leq 0} f(x)$$

The optimality conditions at a solution $\bar{x}$ for this problem are

$$0\in \nabla f(\bar{x}) + N_{\{g\leq 0\}}(\bar{x})$$

where $N_{\{g\leq 0\}}(\bar{x})=\{u: \forall y\in \{g\leq 0\}, \langle u, y-\bar{x}\rangle \leq 0 \}$ is the normal cone to the sublevel set $\{g\leq 0\} = \{x:g(x)\leq 0\}$.

However, the normal vector to the set $\{g\leq 0\}$ is given by $-\nabla g(u)$ for each boundary point $u$. Said another way, at a boundary point $\bar{u}$, the normal cone of the sublevel set of a smooth function $g$ is given by $\{-\lambda\nabla g(\bar{u})\colon \lambda \geq0\}$. So we can write

$$N_{\{g\leq 0\}}(x) = \begin{cases}0 & g(x)<0\\ \{-\lambda\nabla g(x)\colon \lambda \geq 0\} & g(x)=0\\ \emptyset & g(x)>0 \end{cases}$$

Let $\bar{x}$ be a solution of the optimization problem. Then $\bar{x}$ must be feasible, i.e., $\bar{x} \in \{g\leq 0\}$. Thus, by the optimality conditions, we have either

$$-\nabla f(\bar{x})\in N_{\{g\leq 0\}}(\bar{x})\implies \begin{cases}\nabla f(\bar{x}) = 0\quad &\mbox{if }\quad g(\bar{x})<0\\ \nabla f(\bar{x})= \lambda^\star\nabla g(\bar{x})\quad &\mbox{if }\quad g(\bar{x})=0\end{cases}$$

for some $\lambda^\star\geq 0$, which we interpret to mean: if the constraint is active at $\bar{x}$ then $\nabla f(\bar{x})$ is proportional to $\nabla g(\bar{x})$, otherwise $\nabla f(\bar{x})=0$.

For the unconstrained case, we can use the same problem template but with $g\equiv 0$ which gives $\{g\leq 0\} = \mathbb{R}$ and $\nabla g(x)=0$ for all $x\in\mathbb{R}$ as well. Thus, in terms of the above optimality conditions, we are always in the case $g(\bar{x})=0$ and we have

$$\nabla f(\bar{x})=\lambda^\star\nabla g(\bar{x})=0$$

Thus, at a minimizer $\bar{x}$, the gradient of $f$ must be $0$ in the unconstrained case.

Depending on what you've seen in class/homeworks, you might also want to analyze the case with equality constraints:

$$\min\limits_{x:g(x)\leq 0, h(x)=0}f(x).$$

0
On

Defining a smooth convex problem is straightforward, for example $\min\{ x | x^2 \le 1 \}$.

If the constraint set is a closed convex set $C$ and $f$ is smooth & convex, then I like the following necessary & sufficient condition. $x^* \in C$ minimises $f$ on $C$ iff $\langle \nabla f(x^*), x-x^* \rangle \ge 0 $ for all $x \in C$.

The latter is equivalent to $-\nabla f(x^*) \in N_C(x^*)$, the normal cone to $C$ at $x^*$.

Since $N_{\mathbb{R}^n} (x) = \{0\}$, the constrained case follows.

If the constraint set is defined by convex functions, say $C= \{x | g_k(x) \le 0 \}$, then obtaining a 'nice' description of $N_C(x^*)$ requires more assumptions (there must be some 'freedom to move' to obtain a KKT like characterisation).

It is always true (with smooth $f,g_k$, and this needs some work to show) that at a minimiser $x^*$ we can write $\mu_0 \nabla f(x^*) + \sum_{k \in I} \mu_k \nabla g_k(x^*) = 0$, where $I = \{ k | g_k(x^*) = 0 \}$ and $\mu_o, \mu_k$ are convex multipliers, but to show that $\mu_0 >0$ requires additional assumptions. A simple assumption that often works in practice (meaning exam practice :-)) is that the active constraint gradients are linearly independent, in which case $\nabla f(x^*)$ lies in the positive cone generated by the.active gradients. A trivial illustration is $\min \{x | x^2 \le 0 \}$, here $N_C(0) = \mathbb{R}^n$