Derivative of the elementwise matrix absolute

55 Views Asked by At

Initial problem

Consider critical points on a function on $U(N)$. A function $v(X)$ defined on $U(N)$ is expressed as, $$ v(X) = \sum_{ij} | \delta_{ij} - |X_{ij}|^2| $$ where $\delta_{ij}$ denotes Kronecker's delta, $X_{ij}$ denotes the $(i, j)$-th element of $X$ and $I$ is identity matrix (note that inner $|\cdot|$ is the absolute of complex value).

Numerical analysis shows that $v(X)$ is unimodal on $U(N)$, that is, $v(X)$ takes global minimum $0$ at $X=D$, where $D$ is a diagonal matrix whose diagonal element $d_i \in \mathbb{C}$ satisfies $|d_i| = 1$, and no local minimum exist.

I'd like to prove this.

Plan

Take the derivative of $v(X)$ and when it is at the critical point, $v'(X') = v'(ZX) = 0$ is satisfied where $Z$ is a skew-Hermitian matrix (because $X \in U(N)$, so $X'=ZX$). So we want to know the derivative of $v(X)$.

Question

Thanks to the great previous post, the outer derivative is calculated using $\mathrm{sgn}$. The question is that how to deal with the inner $|X_{ij}|^2$. My initial thought is to decompose $X_{ij} = \Re[X_{ij}]+i\Im[X_{ij}]$ and take derivative of $|X_{ij}|^2 = \Re[X_{ij}]^2 + \Im[X_{ij}]^2$ with real part and imaginary part independently.

Let $\Gamma$ be defined as $\Gamma_{ij} = \mathrm{sgn}(\delta_{ij}-|X_{ij}|^2)$. By taking the derivative with real part and imaginary part independently, we have

$$-2\mathrm{Tr}[\Gamma \Re[X]]=0$$ $$-2\mathrm{Tr}[\Gamma \Im[X]]=0$$

for the equation $v(X')=0$. Thus we have $\mathrm{Tr}[\Gamma X']=0$. Does this correct? I am also wondering if there is another simple proof for the initial statement.

1

There are 1 best solutions below

0
On

$ \def\C#1{{\mathbb C}^{#1}} \def\R#1{{\mathbb R}^{#1}} \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\abs#1{\op{abs}\LR{#1}} \def\sign#1{\op{sign}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\norm#1{\left\| #1 \right\|} \def\qif{\quad\iff\quad} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} $Given $X\in\C{m\times n}\:$ define some $\sf Real$ auxiliary matrix variables $$\eqalign{ A_{ij} &= X_{ij}^*\,X_{ij} &\qiq &A_{ij} \ge 0,\;\;A\in\R{m\times n} \\ A &= X^*\odot X &\qiq &dA = X^*\odot dX \\ B &= A-I &\qiq &dB = dA,\;\;B\in\R{m\times n} \\ S &= \sign B &\qiq &\LR{S\odot B}_{ij}\ge0,\;\;S_{ij}\in\{-\o,0,\o\} \\ J_{ik} &= \o &\qiq&J =\{{\sf all\ ones\ matrix}\} \\ }$$ where $\odot$ is the elementwise/Hadamard product, $X^*$ is the complex conjugate of $X,\,$ and the sign() function is applied elementwise.

The objective function is the Holder $\o$-norm (aka the Taxicab or Manhattan norm).

A gradient (in the Wirtinger sense) can be calculated as $$\eqalign{ \nu &= \norm{B}_\o \;=\; J:\LR{S\odot B} \\ d\nu &= S:dB \;=\; S:\LR{X^*\odot dX} \;=\; \LR{S\odot X^*}:dX \\ \grad{\nu}{X} &= {S\odot X^*} \qif \grad{\nu}{X^*} = {S\odot X} \\ }$$


The above derivation uses the Frobenius product, which has the following properties $$\eqalign{ X:Y &= \sum_{i=1}^m\sum_{j=1}^n X_{ij}Y_{ij} \;=\; \trace{X^TY} \\ X:X &= \norm{X}_F^2 \qquad \{ {\rm Frobenius\;norm} \} \\ X:Y &= Y:X \;=\; Y^T:X^T \\ \LR{XP}:Q &= X:\LR{QP^T} \;=\; P:\LR{X^TQ} \\ \LR{X\odot Y}:Z &= X:\LR{Y\odot Z} \\ }$$