Encoding line search condition into the proximal gradient subproblem

56 Views Asked by At

Given a convex function $f$, we often want to solve the problem like $$\inf_{x \in A} f(x),$$ for some convex feasibility set $A$.

With the starting point $x \in A$, we can approximate the gradient $\nabla f(x)$ by solve some subproblem that finds $x_\star \in A$ such that $x_\star - x$ approximates $-\nabla f(x)$ good enough. We can then carry on the usual gradient descent method with $x_\star - x$ as descent direction. However, usually such subproblem is quite expensive to solve, and to find the correct step size, we have to run the subproblems multiple times to approximate $t_i \nabla f(x)$. So I wonder if this is possible to encode the line search condition into the subproblem. For example, if the subproblem that I want to solve is $$\inf_{x_\star \in A} g(\nabla(f(x)), x- x_\star),$$

can I instead solve something like $$\inf_{x_\star \in A, \; t \in (0,1) } g(t \nabla(f(x)), x- x_\star)$$ $$s.t. f(x_\star) \leq f(x) + t (x_\star - x)^T \nabla f(x).$$

I know there is like trivial solution for $t = 0$ and $x_\star = x$, but I wonder if there is any way to modify it so that it works (such as adding $-log(t)$ to the objective). Any thoughts will be much appreciated.

1

There are 1 best solutions below

2
On

I am not aware of any classical optimization method where "we can approximate the gradient $\nabla f(x)$ by solving some subproblem that finds $x_⋆ \in A$ such that $x_⋆−x$ approximates $-\nabla f(x)$ good enough". Furthermore, if you are starting from a point $x \in A$, then unless (i) $x$ is on the boundary of $A$, and (ii) $- \nabla f(x)$ points to the "outside" of the tangent of set $A$ at that point, there always exists a stepsize $\eta > 0$ such that the gradient descent step $x^+ = x - \eta \nabla f(x)$ gives a $x^+ \in A$.

The standard algorithm to solve the problem $$ \inf_{x \in A} f(x) $$ would be Projected Gradient Descent, nicely explained in the CMU lecture of Ryan Tibshirani. If $A$ is a set "sufficiently simple to project onto", then the algorithm can be quite efficient. Otherwise you may need an extra routine that solves the projection of $x^+$ onto set $A$.

Finally, I could not understand what function $g(\cdot)$ is in your problem description. Maybe it is some measure of distance, e.g. the Euclidean norm, that you wanted to minimize to find the $x_*$ that you were after?