Equivalence of computing trace norm of matrix

532 Views Asked by At

Let $X\in \mathbb{R}^{m\times n}$. How to show that the trace norm $\|X\|_\text{tr}$, which is defined as the sum of the singular values of $X$, is equivalent to the following optimization problem?

\begin{array}{ll} \mathop{\text{maximize}}\limits_{Y\in\mathbb{R}^{m\times n}} & \text{tr}(X^TY)\\ \text{subject to} & Y^TY \preceq I_n\end{array}

1

There are 1 best solutions below

0
On BEST ANSWER

I don't know if it is OK in stackexchange but I am trying to answer my own question to explain more about the derivation of the equivalence of trace norm. The reference is mostly from section 4.2.3.2 of MOSEK Modeling Cookbook.

We first prove that $||A||_{2} \le 1$ is equivalent to $A^{T} A\preceq I$. \begin{align*} & ||A||_{2} \le 1\\ \Leftrightarrow \quad & \max_{||x||_{2} =1} ||Ax||_{2} \le 1\\ \Leftrightarrow \quad & \max_{||x||_{2} =1} x^{T} A^{T} Ax\le 1\\ \Leftrightarrow \quad & \max_{||x||_{2} =1} x^{T}\left( A^{T} A-I\right) x\le 0\\ \Leftrightarrow \quad & A^{T} A\preceq I \end{align*} Then our problem \begin{array}{ll} \mathop{\text{maximize}}\limits_{Y\in\mathbb{R}^{m\times n}} & \text{tr}(X^TY)\\ \text{subject to} & Y^TY \preceq I_n\end{array} becomes $$\sup _{||Y||_{2} \leq 1}\mathrm{tr} (X^{T} Y)$$ Then we do SVD decomposition to $X$, $X=U\Sigma V^T$, \begin{equation*} \begin{aligned} \sup _{||Z||_{2} \leq 1}\mathrm{tr} (X^{T} Z) & =\sup _{||Z\| _{2} \leq 1}\mathrm{tr} (V\Sigma ^{T} U^{T} Z)\\ & =\sup _{||Z\| _{2} \leq 1}\mathrm{tr} (\Sigma ^{T} U^{T} ZV)\\ & =\sup _{\| U^{T} ZV\| _{2} \leq 1}\mathrm{tr} (\Sigma ^{T} U^{T} ZV)\\ & =\sup _{\| Y\| _{2} \leq 1}\mathrm{tr} (\Sigma ^{T} Y) \end{aligned} \end{equation*} $\mathrm{tr} (V\Sigma ^{T} U^{T} Z) = \mathrm{tr} (\Sigma ^{T} U^{T} ZV)$ follows from the invariant under cyclic permutations property of trace. And the equivalence of constraint $||Z||_2 \le 1 \Leftrightarrow ||U^T Z V ||_2 \le 1$ follows from the definition of $\ell_2$ norm of matrix. And using the unitary invariance of the norm $||\cdot||_2$ again. We can consider $Y=\mathbf{diag}(y_1, \dots, y_p)$. $$\sup_{\|Z\|_2\leq 1} \mathrm{tr}(X^T Z) = \sup_{|y_i| \leq 1} \sum_{i=1}^p \sigma_i y_i = \sum_{i=1}^p \sigma_i$$