Help with a proof using concavity

64 Views Asked by At

I met a proof which utilizes concavity and really can't figure out why. Suppose we have a concave function $f(x)$ where $x\geq 0$. Let $x^* = \text{argmax }f(x)$, prove that $$\nabla f(x^*)\cdot (x' - x^*)\leq 0, \forall x' \geq 0$$ The original proof is saying:

"otherwise there is a feasible ascent direction, which contradicts the assumption that $x^*$ maximizes $f(x)$"

My thought is like this: based on the concavity, we have $f(x')\leq f(x^*) + \nabla f(x^*) (x'-x^*)$. Therefore, we have $f(x') - f(x^*) \leq \nabla f(x^*) (x'-x^*)$ which means $\nabla f(x^*) (x'-x^*)$ is greater than some negative number (because $x^* = \text{argmax }f(x)$ ). How can we actually get $\nabla f(x^*) (x'-x^*)$ is less than $0$? Is there any algebraic explanation of "feasible ascent direction"?

2

There are 2 best solutions below

0
On BEST ANSWER

These are called variational inequalities. Let $X$ be convex. Let $x^*$ be the maximizer of $f$ on $X$. Define $$ \phi(\varepsilon) = f( (1-\varepsilon)x^*+\varepsilon x'). $$ Then the necessary condition for maximization is $$ \phi'(0) = \nabla f(x^*)(x'-x^*)\le 0, \quad \forall x' \in X. $$

If $x^*$ is on the interior of $X$, $\nabla f(x^*)=0$. For if $\partial f(x^*)/\partial x_i > 0$ for some $i$, you could take $x_i' = x_i^*+\varepsilon$ for $\varepsilon$ sufficiently small and all other $x_k' = x_k^*$, and strictly improve the value of the objective, and similarly if the value of the partial was strictly positive.

Now, on the boundary of $X$, however, this argument does not work. The gradient is trying to "escape" $X$ and the gradient of the function itself cannot be zero. For example, try maximizing $\log(x)$ on $[1,e]$: the gradient never vanishes and the solution is at $e$, where $\log(e)=1$ and $1/e>0$. But it satisfies the variational inequality because $(1/e)(x'-e)<0$ for all $x'\neq e$, because $x'<e$ if $x' \in [1,e)$. At such a point, $\nabla f(x^*)$ is actually the normal of a supporting hyperplane to $X$: essentially the Lagrange/Kuhn-Tucker multipliers --- if $X$ were characterized by a function $h(x)\le 0$, you'd have the standard $$ \nabla \mathcal{L}(x^*,\lambda^*) = \nabla f(x^*) - \lambda^* \nabla h(x^*) = 0 $$ and the complementary slackness condition $\lambda^* h(x^*) = 0$. Without the Lagrangian formulation, you've hit the boundary but just have the $\le 0$, because you are missing the $\lambda \nabla h(x^*)$ term in your FONC.

0
On

Given any $\hat{x}$ on the boundary, we call vector $d \in \mathbb{R}^n$ a feasible direction (at $\hat{x}$) if $\hat{x} + \alpha d \in D$ for a sufficient small enough $\alpha > 0$.

If there exists some $x' \ge 0$ such that \begin{equation} \nabla f(x^{*})^T \cdot\left(x^{\prime}-x^{*}\right) > 0. \end{equation} Let $d = x'-x^*$, then $d$ is actually a feasible ascent direction of $f$ at $x^*$. That because we have \begin{equation} f(x^* + \alpha d) = f(x^*) + \alpha \nabla f(x^*)^T d + o(\alpha) > f(x^*) \end{equation} for all sufficiently small $\alpha > 0$.

For more details, one can refer to this lecture and Bertsekas's book ``Convex Analysis and Optimization'' (Page 255).