I'm trying to compute mirror descent algorithm (MDA) now, and I have a little complication in computing its update rule.
I would like to minimize something in an $l^1$-ball and for this, I took $\Phi (x) = \frac{1}{2(p-1)} ||x||_{p}^2$.
By some algebraic procedure, I've obtained each component of gradient $\nabla \Phi(x)_i = \frac{\partial \Phi}{\partial x_i} = \frac{1}{p-1} ||x||_{p}^{2-p} |x_i|^{p-2} x_i$.
My problem is the following:
Consider the following update rule of mirror descent algorithm:
$x_{t+1} = (\nabla \Phi)^{-1} (\nabla \Phi (x^t) - a_t \nabla f (x_t)$
Given $\nabla \Phi$ calculated previously, how can I obtain $\nabla \Phi^{-1}$ of mentioned update rule? I know that $\nabla \Phi$ takes and element of $\mathbb{R}^d$ and returns an vector again. So, what would be a inverse operator $(\nabla \Phi)^{-1}$?
I think that it should have some relation with some operator that does $\nabla \Phi ^{-1} (\nabla \Phi (x))= x$.
In summary, I would like to know what exactly is $\nabla \Phi ^{-1}$ which appears in mirror descent algorithm's update rule.
The inverse operator for the gradient is the gradient of the convex conjugate ($\nabla \Phi^\star \equiv (\nabla \Phi)^{-1}$).
The convex conjugate is given by
$$\Phi^\star(y) = \sup_{x}\, (\langle x,y\rangle - \Phi(x)) $$
if you can come up with an explicit form of the convex conjugate in your case then taking the gradient has $\nabla \Phi^\star \equiv (\nabla \Phi)^{-1}$. (I'm shoving some technicalities under the rug, cf. Rockafellar, Convex Analysis for all the nice theory).
Indeed, let's say the sup is achieved at $x^\star$ so that $\Phi^\star(y)=\langle x^\star,y\rangle - \Phi(x^\star)$ then the first order optimality condition gives
$$ y = \nabla \Phi(x^\star) $$
showing that $(\nabla \Phi)^{-1}(y) = x^\star=\nabla \Phi^\star(y)$.
So another way to frame the problem is that you need to find the argument:
$$ (\nabla\Phi)^{-1}(y) = \arg\sup_x \,(\langle x,y\rangle - \Phi(x)) .$$
I'm not sure if in your case you can find an explicit form for this (though I haven't tried hard). However in a numerical setting you could solve this approximately which would lead to a two-step mechanism not unlike primal-dual solvers.