differentiation of 1-norm of a vector

15.2k Views Asked by At

Assume you want to find the minimum of the following expression

$\|x\|_1 + \alpha \|Ax-b\|_2$ where $x\in R^N$.

So basically I want to calculate the derivative of $\|x\|_1$ so I could finally set the equation to zero and solve for $x^*$.

What is the best approach to tackle this problem?

2

There are 2 best solutions below

4
On

Let us calculate the derivative of the map $x\mapsto\|x\|_1$.

For all $x=\left(x_1,\ldots,x_N\right)\in\mathbb{R}^N$, one has $$\|x\|_1=|x_1|+\ldots+|x_N|$$ so that $$\frac{\partial}{\partial x_j}\left(x\mapsto\|x\|_1\right)=\frac{x_j}{|x_j|}\quad\quad\quad 1\leq j\leq N$$ provided that $x_j\neq 0$. Whence $$\mathrm{d}\left(x\mapsto\|x\|_1\right)=\sum_{j=1}^{N}\frac{x_j}{|x_j|}\mathrm{d}x_j.$$

Now we compute the derivative of the map $x\mapsto\|Ax-b\|_2$.

If $A=\left(a_{ij}\right)\in\mathrm{M}_{M\times N}\left(\mathbb{R}\right)$, one writes for all $b=\left(b_1\ldots b_M\right)^T\in\mathbb{R}^M$ $$\|Ax-b\|_2=\sqrt{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)^2}$$ and then $$\frac{\partial}{\partial x_j}\left(x\mapsto\|Ax-b\|_2\right)=\frac{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)a_{ij} }{\sqrt{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)^2}}\quad\quad\quad 1\leq j\leq N.$$ Hence, one has $$\mathrm{d}\left(x\mapsto\|Ax-b\|_2\right)=\sum_{j=1}^{N}\frac{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)a_{ij} }{\sqrt{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)^2}}\mathrm{d}x_j.$$

Now your optimization problem : in $\left(\mathbb{R}^N\right)^*$ provided with the base $\mathcal{B}=\left(\mathrm{d}x_1,\ldots,\mathrm{d}x_N\right)$ (the natural dual basis), we write $$\sum_{j=1}^{N}\left(\frac{x_j}{|x_j|}+\alpha \frac{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)a_{ij} }{\sqrt{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)^2}}\right)\mathrm{d}x_j=0.$$ As $\mathcal{B}$ is a basis, each coefficient in the precedent formula must be zero, and thus, one must find $x_1,\ldots,x_N\in\mathbb{R}$ in order to fulfill the equality $$\frac{x_j}{|x_j|}+\alpha \frac{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)a_{ij} }{\sqrt{\sum_{i=1}^{M}\left(\sum_{k=1}^{N}a_{ik}x_k-b_i\right)^2}}=0\quad\quad\quad 1\leq j\leq N.$$

To complete this answer, as pointed out by littleO, the original function is not differentiable everywhere, so one way to answer to the optimization problem is to find extrema with the differential (where it exists) and then check at hand the other points.

1
On

The differentials of the Manhattan and Frobenius norms are $$\eqalign{ d\,\|X\|_1 &= {\rm sign}(X) : dX \cr d\,\|X\|_2 &= \frac{X}{\|X\|_2} : dX \cr }$$ where the sign function is applied element-wise, and the colon represents the Frobenius product, i.e. $\,\,A\!:\!B = {\rm tr}(A^TB)$

Letting $Y=(AX-B)$, the current function and its differential can be written as $$\eqalign{ f &= \|X\|_1 + \alpha\|Y\|_2 \cr\cr df &= {\rm sign}(X) : dX + \frac{\alpha Y}{\|Y\|_2} : dY \cr &= {\rm sign}(X) : dX + \frac{\alpha Y}{\|Y\|_2} : A\,dX \cr &= {\rm sign}(X) : dX + \frac{\alpha A^TY}{\|Y\|_2} : dX \cr }$$ Since $df=\big(\frac{\partial f}{\partial X}:dX\big)\,$ the derivative is $$\eqalign{ \frac{\partial f}{\partial X} &= {\rm sign}(X) + \frac{\alpha A^TY}{\|Y\|_2} \cr &= {\rm sign}(X) + \frac{\alpha A^T(AX-B)}{\|AX-B\|_2} \cr }$$ This is slightly different than the accepted answer because $B$ appears in the numerator. But I cannot spot the problem in Nicolas' derivation.