Discretization Error of Mirror Descent

128 Views Asked by Bumbble Comm At 11 May 2026 - 8:54

It is well known that for sufficiently differentiable functions $f$ and small $\eta>0$ the iterate given by gradient descent $$ x_{k+1}=x_k-\eta \nabla f(x_k)$$ is within $\mathcal O(\eta^2)$ of the gradient flow solution $$\dot x(t)=-\nabla f(x(t)), \hspace{1cm}x(0)=x_k$$ which is a consequence of Taylor's Theorem.

Now Mirror Descent is a generalization of Gradient Descent, where one considers a strongly convex potential $\psi:\mathbb R^d\rightarrow \mathbb R$ and performs the update $$x_{k+1}=(\nabla\psi)^{-1}\bigg(\nabla\psi(x_k)-\eta \nabla f(x_k)\bigg).\hspace{2cm}(MD)$$ One may similarly derive a continuous-time dynamic given by noting that $$\frac{\nabla \psi(x_{k+1})-\nabla\psi(x_k)}{\eta}=-\nabla f(x_k)$$ and thus the most natural candidate for an ODE describing Mirror descent is given by $$\frac{d}{dt}\nabla\psi(x(t))=-\nabla f(x(t))$$ which in turn yields $$\dot x(t)=-\nabla^2\psi(x(t))^{-1}\cdot \nabla f(x(t)). \hspace{2cm}(CMD)$$

Question: Can we derive a similar error estimate between the Mirror Descent updates (MD) and the continuous-time Mirror Descent (CMD)? (I am particularly interested in the case $\psi(x)=p^{-1}\Vert x\Vert_p^p$.)

(You may assume $\psi$ to be as smooth as you need it to be)

The naive approach of performing a Taylor Approximation on (CMD) does not seem to yield the desired bound.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 26 Jan 2024 - 3:07

I have thought more about this problem and have made partial progress towards an answer. Here are some of my thoughts so far: (Feel free to expand on them in another answer, which I can then award the bounty)

The iterates of Mirror Descent can actually be seen as the solution of another ODE. This result was shown in Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent by Gunasekar et. al:

Consider the ODE $$\dot x(t)=-\nabla^2\psi(x(t))^{-1}\cdot\nabla f(x(\lfloor t\rfloor_\eta)), \hspace{1cm} x(0)=x_0. \hspace{1cm}(MD-c_2)$$ Then the solution of the ODE given above is an interpolation of the Mirror Descent iterates.

To see this, note that we can interpolate the MD iteration via $$\forall k,\forall t\in[k\eta, (k+1)\eta):\nabla \psi(x(t))=\nabla\psi(x(k\eta))+(t-k\eta)\nabla f(x(k\eta)),$$ where we can see that $x(k\eta)=x_k$. Differentiating both sides w.r.t. $t$ we get $$\nabla\psi^2(x(t))\dot x(t)=\nabla f(x(\lfloor t\rfloor_\eta))$$ which rearranges to the ODE (MD-c$_2$).

We can now establish error bounds of the iterates of Mirror descent and the ODE $\dot x(t)=-\nabla^2\psi(x(t))\nabla f(x(t))$ (MD-c$_1$) by:

establishing an error bound between the solutions of (MD-c$_1$) and (MD-c$_2$)
establishing an error bound between the solution of (MD-c$_2$) and the iterates of mirror descent

The advantage of this reformulation is that Step 2 might be a bit easier, I think.

Discretization Error of Mirror Descent

There are 1 best solutions below

Related Questions in ORDINARY-DIFFERENTIAL-EQUATIONS

Related Questions in OPTIMIZATION

Related Questions in NUMERICAL-METHODS

Related Questions in GRADIENT-DESCENT

Related Questions in GRADIENT-FLOWS

Trending Questions

Popular # Hahtags

Popular Questions