We define the off-diagonal $L_1$ norm of a matrix as follows: for any $A\in \mathcal{M}_{n,n}$, $$\|A\|_1^{\text{off}} = \sum_{i\ne j}|a_{ij}|.$$ So what is $$\frac{\partial \|A\|_1^{\text{off}}}{\partial A}\;?$$
Derivative of the off-diagonal $L_1$ matrix norm
621 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
$
\def\p{\partial}
\def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)}
\def\t#1{\operatorname{Tr}\LR{#1}}
\def\s#1{\operatorname{sign}\LR{#1}}
\def\g#1#2{\frac{\p #1}{\p #2}}
$Use the element-wise sign() function to define the matrix
$\,S = \s{X}.\;$
The gradient and differential of the Manhattan norm can be written as
$$\eqalign{
\g{\|X\|_1}{X} &= S \quad\iff\quad d\|X\|_1 = S:dX \\
}$$
Suppose that $X$ itself is defined in terms of another matrix $A$
$$\eqalign{
F &= (J-I), \qquad
X &= F\odot A, \qquad
S &= \s{F\odot A} \\
}$$
where $J$ is the all-ones matrix, $I$ is the identity, and $(\odot)$ is the Hadamard product.
$\big($ Note that $X$ is composed of the off-diagonal elements of $A\,\big)$
Substituting into the known differential yields the desired gradient $$\eqalign{ &\quad d\|X\|_1 = S:dX = S:(F\odot dA) = (F\odot S):dA \\ &\boxed{\;\g{\|X\|_1}{A} = F\odot S\;} \\ }$$ where $(:)$ denotes the Frobenius product, which is a convenient notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \t{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ and which commutes with the Hadamard product $$\eqalign{ A:(B\odot C) &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ &= (A\odot B):C \\\\ }$$
NB: The gradient above is not defined for any element of $X$ equal to zero, because the sign() function itself is discontinuous at zero.
(The question changed after I answered. See comment below.)
I am not sure I would call this a norm, so let me write $\phi(A) = \sum_{i \neq j} [A]_{ij}$. It is easy to see that $\phi$ is linear, hence $D\phi(A)(\Delta) = \phi(\Delta) = \sum_{i \neq j} [\Delta]_{ij}$.
If you want a 'matrix' representation, let $e = (1,1,...,1)^T$ and define $\Gamma = e e^T - I$. Then $D\phi(A)(\Delta) = \operatorname{tr} (\Gamma^T \Delta)$ ($\Gamma$ is symmetric, so the transpose is irrelevant, I added it to make the connection with the inner product induced by the Frobenius norm).
Answer to updated question:
This is still not a norm, of course, but now we have $\nu(A) = \sum_{i \neq j} |[A]_{ij}|$. It should be clear that $\nu$ is differentiable iff $[A]_{ij} \neq 0$ for all $i \neq j$, and in this case we have $D \nu(A)(\Delta) = \sum_{i \neq j} \operatorname{sgn} [A]_{ij} [\Delta]_{ij}$. To obtain a 'matrix' representation, let $\Gamma$ be defined as $[\Gamma]_{ij} = \begin{cases} \operatorname{sgn} [A]_{ij} & i \neq j \\ 0 & i =j\end{cases}$, and then we have $D \nu(A)(\Delta) = \operatorname{tr} (\Gamma^T \Delta)$.