In the path-following/interior point, how is the gradient of the objective function a linear combination of the gradients of the constraints?

528 Views Asked by At

I understand the interior point method up until this point:

The intuition behind (5) is that the gradient of $f(x)$ should lie in the subspace spanned by the constraints' gradients.

(from https://en.wikipedia.org/wiki/Interior_point_method near the bottom of the page.)

I do not understand how this could make sense: How should the gradient of the objective function be a linear combination of the gradients of the inequality constraints?

If the gradient of the objective function were supposed to be a linear combination of the equality constraints themselves, then, I could understand it. (that is to say, not a function of the gradients of the constraints, but the actual constraints themselves...and not a function of inequality constraints, but equality constraints. because after all, that is just a weaker condition of the gradient being zero for a constrained optimization problem)

I can think of some simple examples that illustrate my intuition gap: Say we have an optimization problem like this:

$min\ f(x)=(x-4)^2+1 $ subject to $x\le3$, which when translated to standard form in the notation of the wikipedia artlice: $f(x)=(x-4)^2+1$ and $c(x)=3-x$.

The $\nabla f(x) = 2x-8$ and $\nabla c(x)=-1$. How can one be comprised of the rowspace of the other?

I'm thinking it has something to do with this here question: Why is the gradient of the objective function in the Lagrange multiplier theorem not $= 0$? But I'm unable to put my finger on exactly how...

1

There are 1 best solutions below

1
On

The condition is a bit misleading and incomplete, the gradient of the function will indeed be a linear combination of the gradients of all constraints (equalities as well as inequalities), however there are some additional conditions on this linear combination which might make make things more clear. Assume that $f$ is the function we minimise and $c_i(x)\geq 0$ are the constraints. Then a necessary condition for the minimiser $x_0$ is that $\nabla f(x_0) = \sum_i \lambda_i \nabla c_i(x_0)$, for some $\lambda_i$ with $\lambda_i \geq 0$ and $\lambda_i = 0$ if $c_i(x_0)\neq 0$.

The last condition perhaps already solves your confusion, in a way the linear combination only consists of those $\nabla c_i$ for which the constraints are actually equalities.

This condition maybe gets clearer when looking at the idea of the proof. You will probably remember from your calculus classes, that a (sufficiently smooth) function $f$ can only have a (local) minimiser at $x_0$ if $\nabla f(x_0) = 0$. This is usually shown using contradiction: Assume that $\nabla f(x_0) \neq 0$. Then we can move a tiny bit in direction $a := -\nabla f(x_0)$ and since $f$ locally looks like $$f(x) = f(x_0) + \nabla f(x_0) \cdot (x-x_0) + \text{higher order terms} ,$$ you will find that $f(x_0+h a) < f(x_0)$ for small enough $h >0$, which of course is a contradiction.

Now we add constraints. Let us look at a single constraint $c(x) \geq 0$, the extension to multiple constraints is not hard from there. Here also $$c(x) = c(x_0) + \nabla c(x_0) \cdot (x-x_0) + \text{higher order terms}$$ assuming that $c$ is sufficiently smooth. Since you seem to be interested in linear constraints, let us ignore the higher order terms, since they do not occur and are a bit technical to deal with. So we assume $$c(x) = c(x_0) + \nabla c(x_0) \cdot (x-x_0).$$

Now what happens, if we try the trick from before? If we move a tiny bit in a direction $a$ again, $c$ changes as well and we have $$c(x_0 + ha) = c(x_0) + h \nabla c(x_0)\cdot a.$$ If $c(x_0) > 0$, this is no problem. We just pick $h$ small enough and then we will still have $c(x) \geq 0$. So we can choose $a = -\nabla f(x_0)$ as before and conclude that $\nabla f(x_0) = 0$.

Now the other case is $c(x_0) = 0$. This is essentially the classical case of a Lagrange multiplier, with a small difference. We can only pick a direction $a$ for which $c(x_0+ha) \geq 0$. In other words $$0 \leq c(x_0+ha) = c(x_0) + h \nabla c(x_0) \cdot a.$$ If we note that $c(x_0) = 0$ and divide by $h$, we then get the restriction $$0 \leq \nabla c(x_0) \cdot a.$$ Thinking geometrically, we are only allowed to move in the half-space around $x_0$ for which the constraint is not violated.

So the question is, when can we still find a direction satisfying $a$ such that $f$ decreases. By the same reasoning as before we want $$0< f(x_0)-f(x_0 + ha) = -h \nabla f(x_0) \cdot a + \text{higher order},$$ so $0 > \nabla f(x_0) \cdot a$. Now the rest of the proof is simple linear algebra. If we can pick a vector $a$ perpendicular to $\nabla c(x_0)$ with $\nabla f(x_0) \cdot a < 0$, we get a contradiction. So $\nabla c(x_0) \cdot a = 0$ implies $\nabla f(x_0) \cdot a = 0$. As a result $\nabla f(x_0) = \lambda \nabla c(x_0)$ for some $\lambda$. Now assume that this is the case, but for $\lambda < 0$. Then we could try to pick $a = -\nabla f(x_0)$ again, which will decrease $f$, but for which $$\nabla c(x_0) \cdot a = -\lambda \nabla c(x_0) \cdot \nabla c(x_0) > 0$$ which does not violate our constraint. So $\lambda \geq 0$ is the only possibility which does not lead to a contradiction.