Gradient and Hessian of square of Lp-norm $\| \mathbf{x}\|_p^2$, where $1 \leq p<\infty$

140 Views Asked by At

What are the gradient and Hessian of square of any Lp-norm $\| \mathbf{x}\|_p^2$, where $1\leq p<\infty$? One could assume that $\mathbf{x} \in \mathbb{C}^n$ (But I think one can also consider $\mathbf{x} \in \mathbb{R}^n$ if it is convenient to find the solution.)

I found one paper, which shows the gradient of an Lp-norm as following -- see Equation (7) and (14) herein https://arxiv.org/pdf/2107.06986.pdf.

$$ \begin{array}{|c|c|c|} \hline f(\mathbf{x}) & \nabla f(\mathbf{x}) & \nabla^2 f(\mathbf{x})\\ \hline \| \mathbf{x}\|_p^2 & \left(\| \mathbf{x}\|_p^{2-p} | \mathbf{x}|^{p-2} \right) \circ \mathbf{x} & ?\\ \hline \end{array} $$ where $\circ$ denotes element wise multiplication.

I don't know how they have derived it even.

2

There are 2 best solutions below

5
On BEST ANSWER

$ \def\bbC#1{{\mathbb C}^{#1}} \def\bbR#1{{\mathbb R}^{#1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\abs#1{\op{abs}\LR{#1}} \def\sign#1{\op{sign}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\o{{\tt1}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Generalize the problem to a complex matrix $X\in\bbC{m\times n}$

Define the real-valued matrix $A=\abs X\;$ and note that $$\eqalign{ &A\odot A = X\odot X^* \\ &{dA\odot A} + {A\odot dA} \;=\; {dX\odot X^*} + {X\odot dX^*} \\ &{A\odot dA} \;=\; \tfrac12\LR{X^*\odot dX + X\odot dX^*} \\ }$$ where $(\odot)$ denotes the elementwise/Hadamard product and $X^*$ is the complex conjugate.

The $L_p$ norm (to the $p^{th}$ power) can be written using the all-ones matrix $J$
and Hadamard powers of $A$ $$\eqalign{ L_p^p &= J:A^{\odot p} \\ }$$ where $(:)$ denotes the scalar Frobenius product $$\eqalign{ B:C\: &= \sum_{i=1}^m\sum_{j=1}^n B_{ij}C_{ij} \;=\; \trace{B^TC} \\ B:B^* &= \frob{B}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ }$$ The gradient can be calculated as $$\eqalign{ p\,L_p^{p-1}\:dL_p &= J:\LR{p\,A^{\odot p-1}\odot dA} \\ &= J:\LR{p\,A^{\odot p-2}\odot A\odot dA} \\ &= p\,A^{\odot p-2}:\frac12\LR{X^*\odot dX} \;&+\; conjugate \\ dL_p &= \fracLR{X^*}{2L_p}\odot\fracLR{A}{L_p}^{\odot p-2}:dX \;&+\; conjugate \\ \grad{L_p}{X} &= \fracLR{X^*}{2L_p}\odot\fracLR{A}{L_p}^{\odot p-2} \\ \grad{L_p}{X^*} &= \fracLR{X}{2L_p}\odot\fracLR{A}{L_p}^{\odot p-2} \\ }$$ Of course, when $X\in\bbR{m\times n}$ these results can be simplified $$\eqalign{ &\;{A\odot dA} = X\odot dX \\ &\;dL_p = \fracLR{X}{L_p}\odot\fracLR{A}{L_p}^{\odot p-2}:dX \\ &\grad{L_p}{X} = \fracLR{X}{L_p}\odot\fracLR{A}{L_p}^{\odot p-2} \\ }$$ and setting $n=\o$ recovers the vector case.

Finally, your question was about the square of the norm, which would be $$\eqalign{ \grad{L_p^2}{X} &= 2L_p\gradLR{L_p}{X} \\\\ }$$

Update

In the general $\bbC{m\times n}$ case, there are four possible Hessians $$\eqalign{ \def\hess#1#2#3{\grad{^2 #1}{#2\:\p #3}} \hess{L_p}{X}{X},\quad\hess{L_p}{X^*}{X},\quad \hess{L_p}{X}{X^*},\quad\hess{L_p}{X^*}{X^*} \\ }$$ I doubt you are interested in these, so I concentrate on the $\bbR{m\times n}$ case.

We'll need the elementwise self gradient $$\eqalign{ \grad{X}{X_{kl}} = E_{kl},\qquad M_{kl}=M:E_{kl} \\ }$$ where all components of $E_{kl}$ are zero except for the $(k,l)$ component which equals $\o$.
Its Frobenius product with any matrix produces the $(k,l)$ component.

Calculate the differential and gradient of $G$ (aka the hessian) $$\eqalign{ \def\h{\odot} \def\d{\delta} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} q &= p-2 \\ G &= L_p^{\o-p}\LR{X\odot A^{\odot q}} \\ dG &= L_p^{\o-p}\LR{X\odot dA^{\odot q}+A^{\odot q}\odot dX} + \LR{X\odot A^{\odot q}}\LR{\o-p}L_p^{-p}\;dL_p \\ &= L_p^{\o-p}\LR{qX\h A^{\h q-\o}\h dA+A^{\h q}\h dX} + \LR{X\h A^{\h q}}\LR{\o-p}L_p^{-p}\;dL_p \\ &= L_p^{\o-p}\LR{qX\h A^{\h q-2}\h\c{A\h dA}+A^{\h q}\h dX} + \LR{X\h A^{\h q}}\LR{\o-p}L_p^{-p}\;dL_p \\ &= L_p^{\o-p}\LR{qX\h A^{\h q-2}\h\c{X\h dX}+A^{\h q}\h dX} + \LR{X\h A^{\h q}}\LR{\o-p}L_p^{-p}\;dL_p \\ &= L_p^{\o-p}\LR{qA^{\h q}\h dX+A^{\h q}\h dX} + \LR{X\h A^{\h q}}\LR{\o-p}L_p^{-p}\;\c{dL_p} \\ &= \LR{\o+q}L_p^{\o-p}A^{\odot q}\odot dX + \LR{\o-p}L_p^{-p}\LR{X\odot A^{\odot q}}\;\CLR{G:dX} \\ \grad{G}{X_{kl}} &= \LR{\o+q}L_p^{\o-p}\LR{A^{\odot q}\odot E_{kl}} + \LR{\o-p}L_p^{-p}\LR{X\odot A^{\odot q}}\;\LR{G:E_{kl}} \\ &= \LR{p-\o}L_p^{\o-p}\LR{A^{\odot q}\odot E_{kl}} - \LR{p-\o}L_p^{-p}\LR{A^{\odot q}\odot X} G_{kl} \\ }$$ The components of this expression are the components of the Hessian $$\eqalign{ {\large\cal H}_{ijkl} &= \hess{L_p}{X_{ij}}{X_{kl}} = \hess{L_p}{X_{kl}}{X_{ij}} = \grad{G_{ij}}{X_{kl}} \\ &= \LR{p-\o}L_p^{\o-p}\:{A^q_{ij}\,\d_{ik}\d_{jl}} - \LR{p-\o}L_p^{-p}\:{A^q_{ij}X_{ij}}\,G_{kl} \\ }$$ If you are only interested in the vector case, keep the $(i,k)$ indices and drop the other two. This makes the Hessian a matrix with components $$\eqalign{ {H}_{ik} &= \hess{L_p}{x_{i}}{x_{k}} = \hess{L_p}{x_{k}}{x_{i}} = \grad{g_i}{x_k} = \grad{g_k}{x_i} \\ &= \LR{p-\o}L_p^{\o-p}\:{a^q_{i}\,\d_{ik}} - \LR{p-\o}L_p^{-p}\:{a^q_{i}x_{i}}\,g_{k} \\ }$$ Narrowing the scope to the square of the norm of a real vector $$\eqalign{ \def\hessLR#1#2#3{\LR{\hess{#1}{#2}{#3}}} &\hess{\LR{L_p^2}}{x_i}{x_k} \;=\; 2\,L_p\hessLR{L_p}{x_i}{x_k} \;+\; 2 \gradLR{L_p}{x_i} \gradLR{L_p}{x_k}^T }$$

Update 2

$ \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $As Steph points out in the comments, the results (for real vectors) can be simplified by using a diagonal matrix to eliminate Hadamard products and powers (and index notation) $$\eqalign{ \def\g{{\large g}} \def\A{{\large\cal A}} \def\L{L_p} \def\Diag#1{\op{Diag}\LR{#1}} \A &= \Diag{\frac{\abs{x}}{\L}} \\ \g &= \A^{p-2} \LR{\frac{x}{\L}} \\ H &= \frac{p-\o}{\L}\:\LR{\A^{p-2} -\, \g\g^T} \\ }$$

1
On

Start by writing the function in terms of the components:

$$f(\mathbf{x}) =(|x_1|^p +|x_2|^p+\dots +|x_n|^p)^{2/p}$$ Now, take the derivative with respect to $x_i$ (using chain rule): $$\frac{\partial f}{\partial x_i} =p |x_i|^{p-1} \color{red}{\operatorname{sgn}(x_i)}\frac{2}{p}(x_1^p +x_2^p+\dots +x_n^p)^{2/p-1}$$ (Recall that the derivative of $|x|^p$ is $p|x|^{p-1}\operatorname{sgn}(x)$).

The last expression can be cleaned up a little bit: $$\frac{\partial f}{\partial x_i} = 2|x_i|^{p-2}\lVert\mathbf{x}\rVert_p^{2-p} x_i $$ which is the result that you are looking for (expect that there is a $2$ here that was not in the paper).


For the Hessian, you can do the same thing, you just need to compute the derivative

$$\frac{\partial}{\partial x_j} \frac{\partial f}{\partial x_i} $$