The Derivative of a Matrix Function with Nuclear Norm Term

125 Views Asked by At

I want to solve for the following problem $$\min_{X \in \mathbb{R}^{2 \times2}}\{h(A(X))+\|X\|_*\}$$ where $\|\cdot\|_*$ is nuclear norm and $h:\mathbb{R}^{2}\to \mathbb{R}$: $h(y)=\frac{1}{2}\|B^{1/2}y-B^{-1/2}d\|_2^2$ for some positive definite matrix $B$ \begin{bmatrix} 3/2 & -2\\ -2 & 3 \end{bmatrix} and a vector $d$ \begin{bmatrix} 5/2\\ -1 \end{bmatrix} ${}{}{}{}{}{}$ And $A:\mathbb{R}^{2\times 2}\to \mathbb{R}^2$ is a linear operator given by $$A(X)=(X_{11},X_{22})$$ To solve this problem, we need to make use of the first order optimality condition, however I did not figure out how to compute the subdifferential of the first term. Basically, it is a derivative of a vector to a matrix, but the notes from the Wiki did not make sense to me. Anyone can show this for me? Many thanks

1

There are 1 best solutions below

3
On BEST ANSWER

$\renewcommand{\Re}{\mathbb{R}}$ Some general observations:

  1. For function $\phi:\Re^n\to\Re^m$ we have the Jacobian matrix while at the same time $\Re^{n\times n}$ is isomorphic to $\Re^{n^2}$ which is a space of vectors rather than a space of matrices. Therefore, there does exist a notion of derivative for vector fields.

  2. Subgradients are well defined for functions $f:\mathcal{H}\to\Re\cup\{+\infty\}$ where $\mathcal{H}$ is a Hilbert space, therefore, optimality conditions for optimization problems over spaces of matrices are well posed.

We can see $A$ as an operator $A:\Re^{2\times 2} \ni X \mapsto AX\in\Re^2,$ or equivalently as $$ \tilde{A}:\Re^4\ni X = \begin{bmatrix}X_{11}\\X_{12}\\X_{21}\\X_{22}\end{bmatrix}\mapsto \tilde{A}X = \begin{bmatrix}X_{11}\\X_{22}\end{bmatrix} \in \Re^2. $$

Now define a function $\phi:\Re^4 \to \Re$ as $\phi(X) = h(\tilde{A}(X))$; it is then possible to obtain its derivative $\nabla \phi(X) = \tilde{A}^*\nabla h(\tilde{A}X) = \tilde{A}^* (B^{1/2}y-B^{-1/2}d)$, where $\tilde{A}^*$ is the adjoint linear operator of $\tilde{A}$. Notice that $\nabla \phi$ is a function from $\Re^4$ to $\Re^4$ which can be seen as a function from $\Re^{2\times 2}$ to $\Re^{2\times 2}$.

To be more precise, the gradient of the function $\Phi:\Re^{2\times 2}: X \mapsto h(A(X)) \in \Re$ is the function $$ \nabla \Phi: \Re^{2\times 2} \ni X \mapsto \Phi(X) = A^* \nabla(AX) \in \Re^{2\times 2}, $$ where $A^*:\Re^2\to\Re^{2\times 2}$ is the adjoint linear operator of $A$.

Overall, I would recommend to write down the optimality conditions using the proximal operator of the nuclear norm.

See Neal Parikh, Stephen Boyd - Proximal Algorithms, Section 6.7.3.