Numerical methods to minimize a matrix function

243 Views Asked by At

I'm faced with the problem \begin{align*} \min_{A\in\mathbb{R}^{n\times m}}\left\{f(A)+\lambda\lvert\lvert A\rvert\rvert_{S_p}^p\right\}, \end{align*} where $f:\mathbb{R}^{n\times m}\to\mathbb{R}$ is some generic matrix function, $\lambda>0$ is a fixed constant and $\lvert\lvert A\rvert\rvert_{S_p}$ is the $p$-Schatten norm of $A$ defined as \begin{align*} \lvert\lvert A\rvert\rvert_{S_p} = \left(\sum_j|\sigma_j(A)|^p\right)^{1/p}, \end{align*} where $\sigma_j(A)$ are the singular values of $A$ (its a norm ony when $p\geq 1$; when $p=1$, is the nuclear norm). There are some special cases when the problem has an explicit and unique solution. For example, if $p=1$ and $f(A)=g(A):=\lvert\lvert A-X\rvert\rvert_F^2$ for some fixed $X$, where $\lvert\lvert \cdot\rvert\rvert_F$ is the Frobenius norm. Also, there are papers that describe numerical method to solve the problem when $f=g$ and $p<1$.

My question is: Do you know some generic algorithm to solve this problem for any $f$ and any $p$? I'm specially interested in the case $p<1$, when the Schatten norm is not convex (and not a norm).

1

There are 1 best solutions below

0
On

$ \def\a{\alpha}\def\b{\beta}\def\g{\theta}\def\l{\lambda} \def\p{\partial} \def\A{\|A\|_{S_p}} \def\LR#1{\left(#1\right)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $The $p^{th}$ power of the Schatten p-norm can be written as $$\eqalign{ \A^p &= \trace{(A^TA)^{\frac p2}} \\ }$$ and this leads to a rather simple formula for its gradient $$\eqalign{ \grad{\|A\|_{S_p}^p}{A} &= pA\LR{A^TA}^{\b},\qquad \b=\fracLR{p-2}2 \\ }$$ Given the cost function $$\eqalign{ h \;=\; f \;+\; \l\A^p \\ }$$ Assuming that you know the gradient of $f$, the gradient of this cost function is simply $$\eqalign{ \grad{h}{A} \;=\; \grad{f}{A} \;+\; \l pA\LR{A^TA}^{\b} \;\;\doteq\;\; g(A) \\ }$$ Now you can use any gradient-based method (i.e. conjugate gradients, Barzilai-Borwein, etc) to minimize $h.\;$ The basic iteration will look something like this $$\eqalign{ A_{k+1} &= A_k - \a_k\,g(A_k) \\ }$$ where $\a_k$ is the step-length for the chosen method, and $k$ is the iteration counter.