What is the derivative of $ \|X^TW-Y\|_{2,1}$ w.r.t. $W$? How to compute it?

336 Views Asked by At

$W$ is a variable. $\|X^TW-Y\|_{2,1}$ is not smooth due to the $\|\cdot\|_{2,1}$-norm. In order to be differentiable, $\|X^TW-Y\|_{2,1}$ is relaxed to $2\operatorname{Tr}((X^TW-Y)^TD(X^TW-Y))$, where $$D_{ii} = \frac{1}{2\|(X^TW-Y)_i\|_2+\varepsilon}$$

and $\varepsilon$ denotes a small constant. $X \in \mathbb{R}^{d \times n}$, $Y \in \mathbb{R}^{n \times l}$ and $W \in \mathbb{R}^{d\times l}$.

Note that: the norm $\|\cdot\|_{2,1}$ of a matrix $W \in \mathbb{R}^{d \times l}$ is defined as

$$ \Vert W \Vert_{2,1} = \sum_{i=1}^d \Vert w^{i} \Vert_2 = \sum_{i=1}^d \left( \sum_{j=1}^l |w_{ij}|^2 \right)^{1/2} $$ where $w^i$ denotes $i^\text{th}$ row of $W$, $w_{ij}$ denotes a element of $W$.

Some papers as follows:

Multi-Label Informed Feature Selection

Efficient and Robust Feature Selection via Joint $l_{2,1}$-Norms Minimization

1

There are 1 best solutions below

7
On BEST ANSWER

Based on this answer you can write $$ G_A \,=\, \frac{\partial \|A\|_{2,1}}{\partial A} \;=\; A\odot\Big[(A\odot A)\,{\tt\large 1}\Big]^{\odot -1/2} $$ where $(\odot)$ is used to indicate element-wise multiplication and exponentiation.

So the subdifferential of your expression, after setting $A=(X^TW-Y),\,$ becomes $$\eqalign{ \phi &= \|A\|_{2,1} \\ d\phi &= G_A:dA \;= G_A:X^TdW \;= XG_A:dW \\ \frac{\partial\phi}{\partial W} &= XG_A \\ }$$