Gradient in mirror descent

1k Views Asked by Bumbble Comm At 22 Apr 2026 - 4:03

Mirror descent can be in general written as

\begin{equation*} \nabla\Phi(x_{t+1})=\nabla\Phi(x_t)-\lambda_t\nabla f(x_t), \end{equation*}

where $f$ is the objective function and $\Phi$ is a convex function that induces the dual parameter space. What I don't understand is that although $x$ has been transformed to $\nabla\Phi(x)$ for gradient descent, the gradient itself $\nabla f(x)$ is not transformed at all, which makes it incompatible with the dual space. Shouldn't the MD update be instead written as

\begin{equation*} \nabla\Phi(x_{t+1})=\nabla\Phi(x_t)-\lambda_tJ(w_t)^{-T}\nabla f(x_t) \end{equation*}

where $J$ is the Jacobian determinant of $f$ which does the gradient transformation?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 21 May 2015 - 4:07

A short answer. Let $x\in E$ and $f\colon D \subseteq E \to \mathbb{R}$. Then in order to use gradient descent in a dual space we choose some $\Phi \colon E \to \mathbb{R}$. Hence, both $\nabla f(x)$ and $\nabla \Phi(x)$ belong to $E^*$, so the equation above is defined correctly.

A good interpretation of mirror descent belongs to A. Beck and M.Teboulle in their paper "Mirror descent and nonlinear projected subgradient methods for convex optimization". They showed that in fact mirror descent is just a variant of gradient descent method if instead of a square norm we use Bregman distance, namely $$d(x,y) = \Phi(x) - \Phi(y) - \langle\nabla \Phi(y), x-y\rangle.$$

Usual gradient method can be written as: $$x_{t+1} = \arg \min_x \left(\langle \nabla f(x_t), x\rangle + \frac 1 \lambda_t ||x-x_t||^2 \right).$$

By turn, mirror descent: $$x_{t+1} = \arg \min_x \left(\langle \nabla f(x_t), x\rangle + \frac 1 \lambda_t d(x,x_t) \right).$$

Gradient in mirror descent

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions