Gradient of the $ {L}_{2, 1} $ Mixed Norm

980 Views Asked by At

The $(1, 2)$-mixed norm of some matrix $X\in \mathbb{R}^{m \times n}$ is defined as

$$\| X \|_{1,2} = \sum_{j=1}^m \|x_{i,*}\|_2$$

where $x_{i,*}$ denotes the $i^{\text{th}}$ row of $X$. I am trying to find $$\frac{\partial}{\partial X} \|X\|_{1,2}$$ but am confused how to go about this. Does anyone know what this reduces to? I have already consulted the matrix cookbook.

2

There are 2 best solutions below

0
On

I worked it out and it turns out to be a matrix of dimensions $m \times n$ such that the $i^{\text{th}}$ row equals $\frac{x_{i,*}}{||x_{i,*}||}$. For my purposes, I need to write this result in terms of $X$ and not its rows, but I don't know how.

5
On

This norm seems a bit weird, nonetheless its gradient can be written as $$\eqalign{ \frac{\partial N}{\partial X} &= X\odot\Big[(X\odot X)\,1_n\Big]^{\odot-1/2} \cr }$$ where $1_n$ is the vector with all elements equal to one, and $\odot$ represents the Hadamard (element-wise) product.


Update
Here are the details leading to the above result.

The conventional definition of the $L_{p,q}$ norm of $X\in{\mathbb R}^{m\times n}$ is $$ \|X\|_{p,q} = \left[\sum_{j=1}^n \left( \sum_{i=1}^m |X_{ij}|^p \right)^{q/p}\right]^{1/q} $$ The definition used in this question transposes the (p,q) indices as well as the order of the summations $$ \|X\|_{q,p} = \left[\sum_{i=1}^m \left( \sum_{j=1}^n |X_{ij}|^p \right)^{q/p}\right]^{1/q} $$ nevertheless, this is the defintion we will work with, and for typing convenience we'll simply call it $N$.


We need a few more things before we get started, the Frobenius inner product, the Hadamard (element-wise) product, and Hadamard exponentiation $$\eqalign { A:X &= \operatorname{tr}(A^TX) \cr X^{\odot 2} &= X\odot X \cr }$$ Let $A=\operatorname{abs}(X)$ where the abs function is also element-wise.


A particularly useful relationship between $A$ and $X$ is the following $$\eqalign { X\odot X &= A\odot A \cr X\odot dX &= A\odot dA \cr\cr }$$

Now we're ready to find the gradient $$\eqalign { N^q &= \bigg(A^{\odot p}\,1_n\bigg)^{\odot(q/p)}:1_m \cr\cr q\,N^{q-1}dN &= (q/p)\big(A^{\odot p}\,1_n\big)^{\odot(q/p-1)}\odot\big(pA^{\odot(p-1)}\odot dA\big)\,1_n:1_m \cr\cr N^{q-1}dN &= \big(A^{\odot p}\,1_n\big)^{\odot(q/p-1)}\odot 1_m:\big(A^{\odot(p-1)}\odot dA\big)\,1_n \cr &= \big(A^{\odot p}\,1_n\big)^{\odot(q/p-1)}:\big(A^{\odot(p-1)}\odot dA\big)\,1_n \cr &= \big(A^{\odot p}\,1_n\big)^{\odot(q/p-1)}1_n^T:A^{\odot(p-1)}\odot dA \cr &= \big(A^{\odot p}\,1_n1_n^T\big)^{\odot(q/p-1)}:A^{\odot(p-2)}\odot A\odot dA \cr &= \big(A^{\odot p}\,1_n1_n^T\big)^{\odot(q/p-1)}\odot A^{\odot(p-2)}:X\odot dX \cr &= \big(A^{\odot p}\,1_n1_n^T\big)^{\odot(q/p-1)}\odot A^{\odot(p-2)}\odot X:dX \cr \cr \frac{\partial N}{\partial X} &= N^{(1-q)}\,\big(A^{\odot p}\,1_{n\times n}\big)^{\odot(q/p-1)}\odot A^{\odot(p-2)}\odot X \cr \cr }$$ Substituting the values $(q\!=\!1,\, p\!=\!2)$ yields $$\eqalign { \frac{\partial N}{\partial X} &= \big[A^{\odot 2}\,1_{n\times n}\big]^{\odot-1/2}\odot X \cr &= X\odot \big[(X\odot X)\,1_{n\times n}\big]^{\odot-1/2} \cr\cr }$$ Note that the commutivity (and mutual commutivity) of the Hadamard and Frobenius products were used in several places in the derivation.

I'm also using a non-standard definition for negative Hadamard powers. If an element is non-zero take the indicated power, but if the element is zero leave it as zero.