How do mirror prox and accelerated methods differ conceptually?

148 Views Asked by At

In the very basic case of nice function on Hilbert space, dual space is same as primal, it seems to me that they to almost same thing: at point $x_t$ they apply a gradient obtained by looking "one step ahead" $x_{t+1} = x_t + \eta \nabla f(x_t + \eta \nabla f(x_t))$. Mirror prox adds an additional restriction on closeness of both "look ahead" and "gradient step" points in terms of some divergence:

$$ \hat z_t = \arg\min_z (D(z, z_t) + \langle \eta \nabla f(z_t), z \rangle)$$ $$ z_{t+1} = \arg\min_z (D(z, z_t) + \langle \eta \nabla f(\hat z_t), z \rangle)$$

Does the divergence-closeness part is the only difference? Should they perform similarly in saddle point problems without constraints?

Thank you.