On modifying the gradient in gradient descent when the objective function is not convex nor does it have Lipschitz gradient.

115 Views Asked by At

It is well know that if one wants to perform gradient descent on a function $f(x)$ the ideal conditions would be those of strong convexity and Lipschitz continuous gradient, i. e., $$ 0 <\ell \le \nabla^2 f(x) \le L $$ for $\ell, L \in \mathbb{R}$.

Assume Instead we have a function $g(x)$ that does not have Lipschitz continuous gradient nor is strongly convex but we know its gradient $\nabla g(x)$ and that it has a (unique) minimum in $x^*$.

Could we perform gradient descent successfully utilizing as a gradient $h(x) \nabla g(x)$, where $h(x)$ is a function that shares the support of $g(x)$ and is s.t. $h(x^*) \nabla g(x^*) = 0$ (only at $x^*$) and

$$ 0 <\ell \le \nabla (h(x) \nabla g(x)) \le L ,$$

instead of just $\nabla g(x)$.

That is utilizing as an iterative optimization scheme

$$ x_{k+1} = x_k + \alpha h(x_k) \nabla g(x_k) $$

where $\alpha < 2 / L$ instead of

$$ x_{k+1} = x_k + \alpha \nabla g(x_k) $$

1

There are 1 best solutions below

3
On

I do know believe we can find such a function $h$ to make $h(x)\nabla g(x)$ be a gradient of a strongly convex function. Here's my argument.

We can find an optimization problem which is not possible to solve. For examples, solving an optimization problem minimizing a general $10$th-order polynomial with $10$ variables is impossible. We know that it has (unique) global optimum $x^\ast$, but do not know how to solve it (in a reasonable amount of time). If we COULD find such $h(x)$, the original problem $g(x)$ WOULD be solvable, which is a contradiction!