What's the advantage of using mirror descent, rather than just projected gradient descent?

419 Views Asked by At

Here's what I understand are the main steps of mirror descent:

  • Map the current iterate $x_k$ to the dual space, using the gradient of the mirror function $\Phi$: $y_k = \nabla \Phi (x_k)$
  • Take a step in the dual space, with the gradient of our objective: $y_{k+1} = y_k - \eta \nabla f(x_k)$ (the gradient is actually a dual space vector)
  • send this new dual space point back to the primal space by applying the inverse from before, $\nabla \Phi^{-1} (y_{k+1})$
  • project this point into the feasible set $C$ determined by our constraints: $x_{k+1} = \Pi(\nabla \Phi^{-1} (y_{k+1}))$

I've seen mirror descent presented as an alternative to the Projected Gradient Descent algorithm. That one seems to be a lot simpler:

  • Take a gradient step in the primal, as we'd usually do in gradient descent: $x_{k+1} = x_{k+1}^\ast - \eta \nabla f(x_k)$
  • Since this might now not be in the feasible set $C$, project it back into that set: $x_{k+1} = \Pi(x_{k+1}^\ast)$

I assumed that the projection $\Pi$ might be the hard part of this, since if the constraint surface was complex in general, it might not be easy to project. So I'm confused, because it seems like even mirror descent has to use a projection, but it also has more steps, like mapping to the dual space and back.

I've read elsewhere that mirror descent can achieve much faster convergence rates because the form of the function $\Phi$ can be chosen to match the problem structure. But can this not also be done through the choice of the projection function? Or is there some other advantage?

1

There are 1 best solutions below

3
On

You are correct that mirror descent can achieve must faster convergence rates (in some certain cases). It also provides a bit more flexible regularization options. Mirror descent can handle non-smooth objectives, non-Euclidean norm constraints, and/or non-convex objectives, whereas projected gradient descent is only applicable to smooth objectives and Euclidean norm constraints.