matrix differentiation of lp norm

1.8k Views Asked by At

I'm trying to figure out what is the gradient [matrix differentiation] of the lp norm. I saw a few other posts regarding the specific case when for example when p=2 so I think that:

$$\frac{\partial d}{\partial x} ||Ax - b||_2 = A$$

but what about for different arbitrary values of p? In that case, what is$\frac{\partial d}{\partial x} ||Ax - b||_p $, where subscript _p means lp-norm?

2

There are 2 best solutions below

3
On BEST ANSWER

To start, you can write the definition of a p norm and then try to compute the partial derivative. $\| \boldsymbol{\mathrm{x}}\|_p = (|x_1|^p + |x_2|^p + \dots + |x_n|^p)^{\frac{1}{p}}$

For computing a partial derivative with respect to one variable, you assume all the rest of variables are constants and then compute a real derivative wrt the single variable in consideration.

$$\frac{\partial \| \boldsymbol{\mathrm{x}}\|_p }{\partial x_i}= (|x_1|^p + |x_2|^p + \dots + |x_n|^p)^ {1/p-1}*|x_i|^{p-1} * sign(x_i)$$ (Need to check the math here to make sure there are no typos)

You may be able to simplify and combine your partials to come up with a matrix\vector form for the gradient matrix\vector.

When there are functions inside the norm, you can use matrix chain rule for the rest of the computation. Matrixcookbook could be useful here.

Note that some norm are not differentiable in every point. Eg. L1 norm is not differentiable at the zero vector.

*Typo in the derivative is fixed

0
On

Define two new variables $$\eqalign{ M &= (AX-B) &\implies &dM=A\,dX \cr C &= {\rm abs}(M) &\implies &C\odot C=M\odot M \cr &\,&\,&C\odot dC=M\odot dM \cr }$$ where we used $\odot$ denote the elementwise/Hadamard product.

Let's also define Hadamard powers of a matrix in the obvious way, i.e. $M^{\odot 3} = M\odot M\odot M$. Finally, let's use a colon to denote the trace/Frobenius product, i.e. $\,\,A:X={\rm tr}(A^TX)$.

Now we can express the norm in a form which will allow us to find its differential and gradient $$\eqalign{ \phi &= \|M\|_p \cr \phi^p &= \|M\|_p^p = 1:C^{\odot p} \cr d\phi^p &= 1:pC^{\odot(p-1)}\odot dC \cr &= pC^{\odot(p-2)}: C\odot dC \cr p\phi^{p-1}d\phi &= pC^{\odot(p-2)}: M\odot dM \cr \cr d\phi &= \phi^{1-p} C^{\odot(p-2)}\odot M: dM \cr &= \phi^{1-p} C^{\odot(p-2)}\odot M: A\,dX \cr &= \phi^{1-p} A^T\Big(C^{\odot(p-2)}\odot M\Big) : dX \cr \cr \frac{\partial\phi}{\partial X} &= \phi^{1-p} A^T\Big(M\odot C^{\odot(p-2)}\Big) \cr &= \phi^{1-p} A^T\Big((AX-B)\odot ({\rm abs}(AX-B))^{\odot(p-2)}\Big) \cr }$$ Be careful when $p<1$ and components of $M\rightarrow 0$.

Note that when $p=2$ the gradient reduces to $$\eqalign{ \frac{\partial\phi}{\partial X} &= \phi^{-1} A^T(AX-B) \cr }$$ which is a bit different than what you had assumed.