Gradient in mirror descent

1k Views Asked by At

Mirror descent can be in general written as

\begin{equation*} \nabla\Phi(x_{t+1})=\nabla\Phi(x_t)-\lambda_t\nabla f(x_t), \end{equation*}

where $f$ is the objective function and $\Phi$ is a convex function that induces the dual parameter space. What I don't understand is that although $x$ has been transformed to $\nabla\Phi(x)$ for gradient descent, the gradient itself $\nabla f(x)$ is not transformed at all, which makes it incompatible with the dual space. Shouldn't the MD update be instead written as

\begin{equation*} \nabla\Phi(x_{t+1})=\nabla\Phi(x_t)-\lambda_tJ(w_t)^{-T}\nabla f(x_t) \end{equation*}

where $J$ is the Jacobian determinant of $f$ which does the gradient transformation?

1

There are 1 best solutions below

3
On

A short answer. Let $x\in E$ and $f\colon D \subseteq E \to \mathbb{R}$. Then in order to use gradient descent in a dual space we choose some $\Phi \colon E \to \mathbb{R}$. Hence, both $\nabla f(x)$ and $\nabla \Phi(x)$ belong to $E^*$, so the equation above is defined correctly.

A good interpretation of mirror descent belongs to A. Beck and M.Teboulle in their paper "Mirror descent and nonlinear projected subgradient methods for convex optimization". They showed that in fact mirror descent is just a variant of gradient descent method if instead of a square norm we use Bregman distance, namely $$d(x,y) = \Phi(x) - \Phi(y) - \langle\nabla \Phi(y), x-y\rangle.$$

Usual gradient method can be written as: $$x_{t+1} = \arg \min_x \left(\langle \nabla f(x_t), x\rangle + \frac 1 \lambda_t ||x-x_t||^2 \right).$$

By turn, mirror descent: $$x_{t+1} = \arg \min_x \left(\langle \nabla f(x_t), x\rangle + \frac 1 \lambda_t d(x,x_t) \right).$$