Derivative of the $p$-Schatten norm of a symmetric matrix, raised to the $p$th power.

168 Views Asked by At

Given a symmetric matrix $S$, I would like to calculate the derivative of the $p$-Schatten norm of $S$ raised to the $p$th power i.e. $\frac{\partial\|S\|_p^p}{\partial S}$ where $\|S\|_p$ is the $p$-Schatten norm of $S$.

1

There are 1 best solutions below

2
On

$ \def\G{\operatorname{sign}} \def\S{\operatorname{sym}} \def\T{\operatorname{tr}} \def\l{\left} \def\r{\right} \def\p{\partial} \def\g#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $First, solve the more general problem where $S$ is a rectangular matrix. Then, in the final step, allow it to be a (square) symmetric matrix.

Define an auxiliary matrix $A$ such that $$\eqalign{ A &= \l(S^TS\r)^{1/2} \quad&\implies\quad A=A^T = \S(A) \\ A\,A &= S^TS &\implies\quad \S(A\;dA) = \S(S^TdS) \\ }$$ This matrix allows the Schatten $p$-Norm to be written as $$\eqalign{ \sigma &= \|S\|_p = \Big[\T\l(A^p\r)\Big]^{1/p} \\ }$$ Now calculate the differential and gradient of the $p^{th}$ power of the norm. $$\eqalign{ \sigma^p &= \T\l(A^p\r) \\ d\sigma^p &= pA^{p-1}:dA \\ &= \S(pA^{p-2}):A\,dA \\ &= pA^{p-2}:\S(A\,dA) \\ &= pA^{p-2}:\S(S^TdS) \\ &= pSA^{p-2}:dS \\ \g{\sigma^p}{S} &= pSA^{p-2} \\ }$$ People are usually more interested in the gradient of the norm (not raised to any power) $$\eqalign{ \g{\sigma^p}{S} &= p\sigma^{p-1}\,\g{\sigma}{S} \\ \g{\sigma}{S} &= \l(\frac{S}{\sigma}\r) \! \l(\frac{A}{\sigma}\r)^{p-2} \\\\ }$$


In the above a colon is used to denote the trace/Frobenius product, i.e. $$\eqalign{ A:B &= \sum_{i=1}^m \sum_{j=1}^n A_{ij} B_{ij} \;=\; \T(AB^T) \\ A:A &= \big\|A\big\|^2_F \\ }$$ The sym() operator is defined as $$\S(S) = \tfrac 12\l(S+S^T\r)$$ and it has a nice property with respect to the Frobenius product $$\S(A):B \;=\; A:\S(B)\\$$


Finally, if $S$ is symmetric, then $$\eqalign{ A &= \l(S^2\r)^{1/2} = \G(S)\;S \\ }$$ where sign() is the Matrix Sign function.

Since the sign function is involutory, when $p$ is even all the signs cancel, i.e. $$\G\!\l(S\r)^{p-2} = I \qquad\implies\quad A^{p-2} = S^{p-2}$$ and one can simply replace $\,S\to A\;$ in the gradient formula to obtain $$\eqalign{ \g{\sigma^p}{S} &= pS^{p-1} \\ }$$ However, when $p$ is odd one of the sign functions does not cancel and the gradient becomes $$\eqalign{ \g{\sigma^p}{S} &= pS^{p-1} \G(S) \\ }$$ NB:$\;$ If the matrix $S$ is also semi-positive definite, then $$\G(S)=I \quad\implies\quad A=S$$ and the sign issue goes away.