Derivative of the off-diagonal $L_1$ matrix norm

621 Views Asked by At

We define the off-diagonal $L_1$ norm of a matrix as follows: for any $A\in \mathcal{M}_{n,n}$, $$\|A\|_1^{\text{off}} = \sum_{i\ne j}|a_{ij}|.$$ So what is $$\frac{\partial \|A\|_1^{\text{off}}}{\partial A}\;?$$

2

There are 2 best solutions below

10
On BEST ANSWER

(The question changed after I answered. See comment below.)

I am not sure I would call this a norm, so let me write $\phi(A) = \sum_{i \neq j} [A]_{ij}$. It is easy to see that $\phi$ is linear, hence $D\phi(A)(\Delta) = \phi(\Delta) = \sum_{i \neq j} [\Delta]_{ij}$.

If you want a 'matrix' representation, let $e = (1,1,...,1)^T$ and define $\Gamma = e e^T - I$. Then $D\phi(A)(\Delta) = \operatorname{tr} (\Gamma^T \Delta)$ ($\Gamma$ is symmetric, so the transpose is irrelevant, I added it to make the connection with the inner product induced by the Frobenius norm).

Answer to updated question:

This is still not a norm, of course, but now we have $\nu(A) = \sum_{i \neq j} |[A]_{ij}|$. It should be clear that $\nu$ is differentiable iff $[A]_{ij} \neq 0$ for all $i \neq j$, and in this case we have $D \nu(A)(\Delta) = \sum_{i \neq j} \operatorname{sgn} [A]_{ij} [\Delta]_{ij}$. To obtain a 'matrix' representation, let $\Gamma$ be defined as $[\Gamma]_{ij} = \begin{cases} \operatorname{sgn} [A]_{ij} & i \neq j \\ 0 & i =j\end{cases}$, and then we have $D \nu(A)(\Delta) = \operatorname{tr} (\Gamma^T \Delta)$.

0
On

$ \def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\t#1{\operatorname{Tr}\LR{#1}} \def\s#1{\operatorname{sign}\LR{#1}} \def\g#1#2{\frac{\p #1}{\p #2}} $Use the element-wise sign() function to define the matrix $\,S = \s{X}.\;$
The gradient and differential of the Manhattan norm can be written as $$\eqalign{ \g{\|X\|_1}{X} &= S \quad\iff\quad d\|X\|_1 = S:dX \\ }$$ Suppose that $X$ itself is defined in terms of another matrix $A$ $$\eqalign{ F &= (J-I), \qquad X &= F\odot A, \qquad S &= \s{F\odot A} \\ }$$ where $J$ is the all-ones matrix, $I$ is the identity, and $(\odot)$ is the Hadamard product.
$\big($ Note that $X$ is composed of the off-diagonal elements of $A\,\big)$

Substituting into the known differential yields the desired gradient $$\eqalign{ &\quad d\|X\|_1 = S:dX = S:(F\odot dA) = (F\odot S):dA \\ &\boxed{\;\g{\|X\|_1}{A} = F\odot S\;} \\ }$$ where $(:)$ denotes the Frobenius product, which is a convenient notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \t{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ and which commutes with the Hadamard product $$\eqalign{ A:(B\odot C) &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ &= (A\odot B):C \\\\ }$$


NB: The gradient above is not defined for any element of $X$ equal to zero, because the sign() function itself is discontinuous at zero.