Proximal Operator Fixed Point Property for Matrices

1k Views Asked by At

$\newcommand{\prox}{\operatorname{prox}}$ $\newcommand{\argmin}{\operatorname{argmin}}$ $\newcommand{\dom}{\operatorname{dom}}$

Recall again that the proximal operator for vectors $\prox_{f}: R^n \rightarrow R^n$ of $f$ is defined as:

$\prox_f(v) := \argmin_{x} \left(f(x) +(1/2)\|x-v\|_2^2 \right) $,

where $f: R^n \rightarrow R \cup \infty$ is a closed proper convex function and $\|\cdot \|_2$ is the Euclidean norm. $\prox$ is strongly convex and not everywhere infinite, so there it will have a unique minimizer for every $v \in R^n$

The crucial property of proximal operator is that $x^*$ minimizes $f(x)$ iff $x^* = \prox_f(x^*)$, i.e. $x^*$ is a fixed point of $\prox_f$.

Let us consider the proof of this property. The first part, namely, that if $x^*$ minimizes $f$ then $\prox_f(x^*)=x^*$, is trivial:

$f(x) +(1/2)\|x-x^*\|_2^2 \le f(x^*) = f(x^*) +(1/2)\|x^*-x^*\|_2^2$

The second part uses the notion of subdifferential. In the proof authors say that $\tilde{x}$ minimizes

$f(x) +(1/2)\|x-v\|_2^2$

i.e. $\tilde{x} = \prox_f(v)$ iff $0 \in \partial f(\tilde{x}) + (\tilde{x}-v)$,

where the sum is of a set and a point. Recall the definition of the subdifferential:

$\partial f(x) = \{ y$ | $ f(z)\le f(x) + y^T(z-x)$ $\forall z \in \dom{f}\}$

Take $\tilde{x}=v=x^*$, it follows that $0 \in \partial f(x^*)$ so $x^*$ minimizes $f$.

Question 1:

Consider proximal operator defined for matrices now:

$\prox_f(Y) := \argmin_{X} \left(f(X) +(1/2)\|X-Y\|^2 \right)$,

where $X$ is some real $m$ by $n$ matrix. What norm function should be taken in the definition of the $\prox$ in this case? Frobenius? Or spectral norm (induced 2nd norm) which is the largest eigenvalue (is spectral norm even differentiable?)? Or something else?

Question 2:

Can the norm for the definition of $\prox$ in case of matrices be chosen in different ways? What conditions does the proof above impose on the norm function?

1

There are 1 best solutions below

0
On BEST ANSWER

Q1) Any true norm will do here. The Frobenius norm is a bit easier to work with, because of course the squared norm is easily differentiated. I don't believe the spectral norm is differentiable.

Q2) A more general class of functions that can be used instead of the squared norm are Bregman divergences, or Bregman distances. I believe the key properties that a Bregman distance $D(x,y)$ acquires for your purposes are are:

  • $D(x,y)\geq 0$ for all $x$.
  • $D(x,y)=0$ if and only if $x=y$.
  • $D(x,y)$ is convex and continuously differentiable in $x$.

Note that Bregman distances are generalizations of the squared term $\tfrac{1}{2}\|x-y\|_2^2$. In other words, the prox function is $$\mathop{\textrm{prox}}_{f,D}(Y) = \mathop{\textrm{arg}\,\textrm{min}}_x f(X) + D(X,Y)$$ I think if you do a Google search for "Matrix Bregman distances" or "Matrix Bregman divergences" you'll find a few alternatives for matrices. But many of them are specifically suited for symmetric or even positive semidefinite matrices.