Minimizing the Schatten 1-norm over symmetric matrices.

Question

Minimizing the Schatten 1-norm over symmetric matrices.

624 Views Asked by Bumbble Comm At 06 Apr 2026 - 3:02

Let $X$ be an $n \times n$ Hermitian matrix, it follows that we can write $X=A+iB$ where $A$ is symmetric and $B$ is skew-symmetric. Let $S$ be the set of $n \times n$ symmetric matrices and define the Schatten $1$-norm (also known as trace norm) as $\lVert M \rVert_1=\sum_j \sigma_j(M)$, where $ \sigma_j(M)$ are the singular values of $M$. To measure how close $X$ is to the subspace of symmetric the matrices, we can define the distance measure $$ D(X) = \min_{Y\in S} \lVert X-Y \rVert_1 .$$ Expanding $X$ into symmetric and skew-symmetric parts, we get that $$D(X)= \min_{Y\in S} \lVert (A-Y)+iB \rVert_1$$ From this it seems intuitively that the minimum occurs when $Y=A$, in which case $D(X)=\lVert B\rVert_1$ is simply the $1$-norm of the imaginary part of $X$. How would one show that this quantity is minimized for $A=Y$?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 18 Dec 2020 - 7:33

If the distance is measured using the Frobenius norm then the problem is almost trivial.
(This post works through the exercise that Josse suggested at the end of his answer)

Use a real symmetric matrix $A=A^T$ and a real skew-symmetric matrix $B=-B^T$ to construct the hermitian matrix $$\eqalign{ X &= A + iB \quad\implies\quad X = X^H,\quad X^T = X^* \\ }$$

Use an unconstrained matrix $U\in{\mathbb C}^{n\times n}$ to construct the complex symmetric matrix $$\eqalign{ Y &= {\rm Sym}(U) \doteq \tfrac 12(U+U^T) \quad\implies\quad Y = Y^T,\quad Y^H = Y^* \\ }$$ Calculate the gradient of the objective with respect to $U$ $$\eqalign{ \phi &= \big\|X-Y\big\|_F^2 \;=\; (X-Y)^*&:(X-Y) \\ d\phi &= (Y-X)^*:dY &\quad+\; conj \\ &= (Y-X)^*:{\rm Sym}(dU) &\quad+\; conj \\ &= {\rm Sym}(Y-X)^*:dU &\quad+\; conj \\ &= (Y-\tfrac 12(X+X^*))^*:dU &\quad+\; conj \\ &= (Y-{\cal Re}(X))^*:dU &\quad+\; conj \\ \frac{\partial\phi}{\partial U} &= (Y-{\cal Re}(X))^* \quad& \frac{\partial\phi}{\partial U^*} = Y-{\cal Re}(X) \\ }$$ where "+ $conj\,$" denotes the complex conjugate of the first half of the expression.

Anyway, setting either gradient to zero yields $$\eqalign{ Y &= {\cal Re}(X) = A \\ }$$

In the above, a colon is used to denote the trace product, i.e. $$\eqalign{ A:B &= {\rm tr}(A^TB) = \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; B:A \\ X^*:X &= {\rm tr}(X^HX) \;\doteq\; \big\|X\big\|_F^2 \\ }$$ It is interesting that the result obtained using the Frobenius norm is apparently the same as a much more difficult derivation involving the Nuclear norm.

**Bumbble Comm** · Accepted Answer

Edit: It seems that I let myself get carried away by specifics, and as a consequence completely missed the bigger picture. My old answer is kept at the bottom for historical purposes. Let it be a reminder to all — including myself — that accepted answers on this site can occasionally be clumsy, needlessly complicated, or potentially even incorrect (though the latter doesn't seem to apply here).

$$ {} $$

New answer — much shorter and more general:

We present the result in a much more general setting (which is probably closer to your intuition).

Definition. Let $X$ be a complex vector space. A complex conjugation is a conjugate-linear map $f : X \to X$ which is equal to its own inverse, in other words:

For all $x,y\in X$ and $\lambda,\mu \in \mathbb{C}$ we have $f(\lambda x + \mu y) = \overline\lambda f(x) + \overline \mu f(y)$;

For all $x\in X$ we have $f(f(x)) = x$.

Instead of $f$ we usually write $\bar{\ }\, : X \to X$. The conjugate of an element $x\in X$ is written $\overline x$. We say that $x$ is real if $\overline x = x$ holds. Every $x\in X$ can be uniquely expressed as $x = a + ib$ with $a$ and $b$ real. These $a$ and $b$ are given by $a = \text{Re}(x) := \tfrac{1}{2}(x + \overline x)$ and $b = \text{Im}(x) := \tfrac{1}{2i}(x - \overline x)$. The set of real elements of $X$ forms an $\mathbb{R}$-subspace (but not a $\mathbb{C}$-subspace!), which we denote by $\text{Re}(X)$.

If $(X,\lVert\:\cdot\:\rVert)$ is a normed space, then a complex conjugation $\bar{\ }\, : X \to X$ defines an isometry (of the underlying metric space) if and only if $\lVert \overline x\rVert = \lVert x\rVert$ holds for all $x\in X$.

Proposition. Let $(X,\lVert\:\cdot\:\rVert)$ be a complex normed space and let $\bar{\ }\, : X \to X$ be an isometric complex conjugation. Then for any $x\in X$ and $y\in\text{Re}(X)$ we have $$ \lVert x - y\rVert \: \geq \: \lVert x - \text{Re}(x)\rVert. $$

Proof. Write $x = a + ib$ with $a,b\in\text{Re}(X)$, given explicitly by $a := \text{Re}(x)$ and $b := \text{Im}(x)$. The result follows from a simple application of the triangle inequality:

Indeed, we have \begin{align} \hspace{-35mm}2\lVert x - \text{Re}(x)\rVert \: = \: \lVert 2ib\rVert \: &= \: \lVert (a + ib) - y \: + \: y - (a - ib)\rVert\\[1ex] &\leq \: \lVert (a + ib) - y \rVert + \lVert y - (a - ib)\rVert\\[1ex] &= \: \lVert (a + ib) - y\rVert + \lVert (a - ib) - y\rVert\\[1ex] &= \: \lVert (a + ib) - y\rVert + \big\lVert \, \overline{(a + ib) - y} \, \big\rVert\tag*{(since $a,b,y\in\text{Re}(X)$)}\\[1ex] &= \: 2\lVert (a + ib) - y\rVert\tag*{(since $\bar{\ }\,$ is isometric)}\\[1ex] &= \: 2\lVert x - y\rVert.\tag*{$\blacksquare$} \end{align}

Now all that remains is to show that the entry-wise complex conjugation $\bar{\ }\, : M_n(\mathbb{C}) \to M_n(\mathbb{C})$ preserves the trace norm. This is not so hard to do. For arbitrary matrices $A,B \in M_n(\mathbb{C})$ we have $\overline{AB} = \overline A \, \overline B$ and $\overline{A}^* = \overline{A^*}$, hence $$ \overline{A}^*\,\overline A \: = \: \overline{A^*A}. $$ Now it is easy to see that $\overline{|A|}$ is a positive semidefinite square root of $\overline{A^*A}$, so it follows that $\overline{|A|} = |\,\overline A\,|$ holds. Finally, we find $$ \lVert \overline A\rVert_1 \: = \: \text{tr}\big(\big|\,\overline A\,\big|\big) \: = \: \text{tr}\left(\overline{|A|}\right) \: = \: \overline{\text{tr}(|A|)} \: = \: \overline{\lVert A\rVert_1} \: = \: \lVert A\rVert_1. $$

$$ {} $$

In summary: the entire result follows essentially from one application of the triangle inequality.

$$ {} $$

Old answer — needlessly complicated:

Your intuition is correct, ~~but it's not particularly easy to prove this~~. We will need quite a few facts about the trace norm. Let $\mathbb{C}^{n\times n}$ denote the algebra of $n\times n$ complex matrices, and let $\lVert \:\cdot\: \rVert_\infty : \mathbb{C}^{n\times n} \to \mathbb{R}_{\geq 0}$ denote the operator norm (also know as the Schatten $\infty$-norm).

Proposition 1 ("non-commutative Hölder inequality", case $p = 1$ and $q = \infty$). For any $X,Y\in \mathbb{C}^{n\times n}$ we have $\lVert XY\rVert_1 \leq \lVert X\rVert_1\cdot \lVert Y\rVert_\infty$.

Proof. See your favourite textbook on Schatten norms. (The only references I can produce at this time are books in functional analysis and operator theory, which are likely to be inappropriate due to being way too advanced.)

Proposition 2. For any $X\in \mathbb{C}^{n\times n}$ we have $|\text{tr}(X)| \leq \lVert X\rVert_1$.

Proof. Use this question and the triangle inequality.

Proposition 3. For any $X\in \mathbb{C}^{n\times n}$ there exists a unitary $U\in \mathbb{C}^{n\times n}$ such that $|X| = UX$.

Proof. This is (equivalent to) the polar decomposition.

Corollary 4. For any $X\in \mathbb{C}^{n\times n}$ we have $$\lVert X\rVert_1 = \max_{\substack{Y \in \mathbb{C}^{n\times n}\\\lVert Y \rVert_\infty \leq 1}} |\text{tr}(XY)|. $$ Furthermore, there exists some $U\in \mathbb{C}^{n\times n}$ with $\lVert U\rVert_\infty = 1$ such that $\lVert X\rVert_1 = \text{tr}(XU)$ holds (note the omission of the absolute value).

Proof. By propositions 1 & 2, any $Y\in \mathbb{C}^{n\times n}$ with $\lVert Y\rVert_\infty \leq 1$ satisfies $$ |\text{tr}(XY)| \leq \lVert XY\rVert_1 \leq \lVert X\rVert_1\cdot \lVert Y\rVert_\infty \leq \lVert X\rVert_1. $$ By proposition 3, there is some unitary $U\in \mathbb{C}^{n\times n}$ such that $|X| = UX$ holds. Note that we have $\lVert U \rVert_\infty = 1$ since $U$ is unitary. Now we have $\lVert X\rVert_1 = \text{tr}(|X|) = \text{tr}(UX) = \text{tr}(XU)$.

Now that we have all the ingredients, let's get on with the exercise at hand. Recall that we have $X = A + iB$ with $A$ real symmetric and $B$ real skew-symmetric, and we want to find a real symmetric matrix $Y$ minimising the distance $\lVert X - Y\rVert_1$.

Assume without loss of generality that $A = 0$ holds, so that we have $X = iB$. We prove that the minimum occurs at $Y = 0$. Choose some $U \in \mathbb{C}^{n\times n}$ satisfying $\lVert U\rVert_\infty \leq 1$ as well as $\lVert X\rVert_1 = \text{tr}(XU)$. Now we have $$ \text{tr}(XU) = \text{tr}\big((XU)^T\big) = \text{tr}\big(U^TX^T\big) = -\text{tr}\big(U^TX\big) = -\text{tr}\big(XU^T\big), $$ so we may assume without loss of generality that $U$ is skew-symmetric (replace $U$ by $\tfrac{1}{2}(U - U^T)$, if neccesary, then the properties $\lVert U\rVert_\infty \leq 1$ and $\lVert X\rVert_1 = \text{tr}(XU)$ still hold).

Now let $Y$ be an arbitrary real symmetric matrix, then we have $$ \text{tr}(YU) = \text{tr}\big((YU)^T\big) = \text{tr}\big(U^TY^T\big) = \text{tr}\big(U^TY\big) = -\text{tr}(UY) = -\text{tr}(YU), $$ hence $\text{tr}(YU) = 0$. As a result, by corollary 4 we have $$ \lVert X - Y\rVert_1 \geq \big|\text{tr}\big((X - Y)U\big)\big| = |\text{tr}(XU) - \text{tr}(YU)| = \big|\lVert X\rVert_1 - 0\big| = \lVert X\rVert_1, $$ which proves the result.

Closing remarks:

It might be true that we can always choose a skew-symmetric unitary $U$ in the polar decomposition of a skew-symmetric self-adjoint matrix; I'm not entirely sure about that. In any case, proving that seemed like a hassle, so I decided on the above workaround.
Note that we didn't really use that $Y$ was real, only that it was symmetric. As such, it seems that the result extends to the minimisation problem over all complex symmetric matrices. (Come to think of it, we didn't really use that $B$ was real either...)
The minimum is not always unique. If $n$ is even, then the spectrum of $X$ is of the form $\{-\lambda_1,\lambda_1,-\lambda_2,\lambda_2,\ldots\}$ with $0 \leq \lambda_1 \leq \lambda_2 \leq \cdots$. If we have $\mu\in\mathbb{R}$ and $0 < |\mu| \leq \lambda_1$, then $Y = \mu I$ is also at minimal distance. After all, half of the eigenvalues are moved closer to zero and half are moved away from zero, so the sum of their absolute values remains the same.
The same result holds if we replace the trace norm by the Hilbert–Schmidt norm $\lVert \:\cdot\: \rVert_2$ (also known as the Schatten 2-norm), and the proof is much easier in this case. I'll leave this as an exercise. ;-)

Minimizing the Schatten 1-norm over symmetric matrices.

There are 2 best solutions below

New answer — much shorter and more general:

Old answer — needlessly complicated:

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in FUNCTIONAL-ANALYSIS

Related Questions in NORMED-SPACES

Related Questions in SPECTRAL-THEORY

Trending Questions

Popular # Hahtags

Popular Questions