Inverse of the Gradient of the Squared $l_p$ Norm

249 Views Asked by Bumbble Comm At 01 Apr 2026 - 11:38

For $\mathbf{w} \in \mathbb{R}^n$, let $$ \Psi(\mathbf{w}) = \|\mathbf{w}\|_p^2 = \left(\sum_{i = 1}^n |\mathbf{w}_i|^{p} \right)^{2/p}. $$ By a fairly direct calculation, $$ \left(\nabla\psi(\mathbf{w})\right)_i = \|\mathbf{w}\|_p^{2 - p}\|(|\mathbf{w}_i|^{p - 1})(\text{sign}(\mathbf{w}_i)) $$ for $\mathbf{w}_i \in \mathbb{R} - \{0\}$. I am interested in finding the expression for $((\nabla \Psi)^{-1}(\mathbf{v}))_i$, i.e. for the function $G$ such that $(G(\nabla \Psi(\mathbf{w}))_i = \mathbf{w}_i$. I am told that $$ \left(\nabla\psi^{-1}(\mathbf{v})\right)_i = \|\mathbf{v}\|_q^{2 - q}\|(|\mathbf{v}_i|^{q - 1})(\text{sign}(\mathbf{v}_i)) $$ where $q$ is the dual to $p$, i.e. $\frac{1}{p} + \frac{1}{q} = 1$. But cannot see to derive this. I imagine that it has something to do with the properties of dual norms. I also feel like its simple than I think it is. Any help would be appreciated. I apologize in advance if I messed up somewhere in the notation, or I am misunderstanding something more severe.

For those interested in the context, $\Psi$ may be thought of as a regularizer for an online learning algorithm and $\nabla \Psi$, $(\nabla\Psi)^{-1}$ are used in so-called "dual averaging" and "mirror descent" algorithms.

Original Q&A

Inverse of the Gradient of the Squared $l_p$ Norm

Related Questions in CALCULUS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions