Relation between gradient norm and distance to optimizer

47 Views Asked by At

For context, this problem arose when I was trying to prove convergence rate bounds for convex optimization algorithms.

Let $f: \mathcal{X} \to \mathbb{R}$, where $\mathcal{X} \subset \mathbb{R}^n$ is convex. Moreover, let $f$ be a $\mu$-strongly convex, $G$-Lipschitz continuous, with $L$-Lipschitz continuous gradients (a.k.a. $L$-smooth). We can also assume that $\|x - y\|_2 \leq R$ for all $x,y \in \mathcal{X}$.

Define $x^* := \text{arg}\min_{x\in\mathcal{X}} f(x)$. My goal is to prove the following statement:

$$ \frac{1}{2L}\|\nabla f(x)\|^2_2 - \frac{\mu}{2}\|x - x^*\|^2_2 \leq 0, \quad \forall x \in \mathcal{X}. \tag{1} $$

So far, I was able to show that it holds for the special case when $L=\mu>0$, and $x^*$ belongs to the relative interior of $\mathcal{X}$ (i.e., $\nabla f(x^*) = 0$). To do so, I used the following argument:

\begin{align*} \frac{1}{2L}\|\nabla f(x)\|^2_2 - \frac{\mu}{2}\|x - x^*\|^2_2 &\leq \frac{1}{2L}\|\nabla f(x) - \nabla f(x^*)\|^2_2 - \frac{\mu}{2}\|x - x^*\|^2_2 \\ &\leq \frac{L}{2}\|x - x^*\|^2_2 - \frac{\mu}{2}\|x - x^*\|^2_2 \\ &= 0. \end{align*}

QUESTION: is it possible to prove that $(1)$ holds for general $L,\mu$? If not, are there other special cases/assumptions under which the statement holds?