Inequality of square root inside trace norm

1k Views Asked by At

Empirically, I found that the following inequality holds as long as both $A$ and $B$ are p.s.d

$$\|(A+B)^\frac{1}{2}\|_\star \leq \|A^\frac{1}{2} + B^\frac{1}{2}\|_\star$$

where $\|\cdot\|_\star$ is the trace norm defined here

$$||A||_\star := \operatorname{trace}\left(\sqrt{A^TA}\right) = \sum\limits_{i=1}^{\min\{m,n\}} \sigma_i$$

Can anyone give me some hint on how to prove it? Thank you.

1

There are 1 best solutions below

0
On BEST ANSWER

Let $A$ and $B$ be Hermitian positive semidefinite matrices. We want to prove
$$ \Big \Vert \big(A+B\big)^{\frac{1}{2}}\Big\Vert_{S_1} \le \Big \Vert \big(A^{\frac{1}{2}}+B^{\frac{1}{2}}\big)\Big\Vert_{S_1}, $$
where the norm is the Schatten 1 norm, given by the sum of the singular values of a particular matrix. This is also known as the nuclear norm and sometimes as the trace norm.

Since both matrices are HPSD their eigenvalues are the same as their singular values. Using their eigendecompositions, we have
$$A =\sum_{k=1}^n \lambda_{k,A}\mathbf q_k\mathbf q_k^* \qquad \text{and} \qquad B =\sum_{k=1}^n \lambda_{k,B}\mathbf v_k\mathbf v_k^*$$
with eigenvalues in order from largest to smallest and orthonormal eigenvectors selected in each case).

The desired inequality may be written in an equivalent more convenient form using the following $2n\times 2n$ matrices:
$$ W := \begin{bmatrix}A &\mathbf 0\\ \mathbf 0&B\end{bmatrix} \qquad \text{and} \qquad X := \begin{bmatrix}A + B &\mathbf 0\\ \mathbf 0&\mathbf 0\end{bmatrix}. $$
Then $$\Big \Vert \big(A+B\big)^{\frac{1}{2}}\Big\Vert_{S_1} = \text{trace}\Big(\big(A+B\big)^{\frac{1}{2}} \Big) =\text{trace}\Big(X^\frac{1}{2}\Big) \leq \text{trace}\Big(W^\frac{1}{2}\Big) = \text{trace}\Big(A^{\frac{1}{2}}+B^{\frac{1}{2}}\Big).$$
This result is obviously true when $A$ and $B$ commute -- it is the triangle inequality. This intuition still holds in the non-commuting case, though the result is harder to tackle head-on. (The below uses tools from majorization prove the result.)

If we collect the $2\cdot n$ eigenvalues of $W$ in a vector $\mathbf w$ and do the same for $X$ in $\mathbf x$ (in each case observing the usual ordering of largest to smallest), then we have
$$\mathbf w \preceq \mathbf x,$$ where $\preceq $ denotes (strong) majorization; this is proven at the end.
If we consider the symmetric function $f \colon \mathbb R_{\ge 0}^{2n} \to \mathbb R_{\ge 0}$ given by $f\big(\mathbf u\big) = \sum_{k=1}^{2n} u_k^\frac{1}{2}$, then $f$ is Schur concave because the mapping $u\mapsto u^\frac{1}{2}$ is concave for any $u \ge 0$.

Combining majorization with Schur concavity we know
$$ \mathbf w \preceq \mathbf x \implies \text{trace}\Big(X^\frac{1}{2}\Big) = f\big(\mathbf x\big) \le f\big(\mathbf w\big) = \text{trace}\Big(W^\frac{1}{2}\Big),$$
which completes the proof of OP's inequality.


Proof of the majorization.
We need to prove that for $r \in \big\{1,2,\ldots,2n-1,2n\big\}$, $\sum_{k=1}^r w_k = \sum_{k=1}^r \lambda_k\big(W\big) \le \sum_{k=1}^r \lambda_k\big(X\big) = \sum_{k=1}^r x_k$ with equality when $r:=2n$ because this is strong majorization. This is immediate:
$\sum_{k=1}^{2n} \lambda_k\big(W\big) = \text{trace}\Big(\begin{bmatrix}A &\mathbf 0\\ \mathbf 0&B\end{bmatrix}\Big) =\text{trace}\Big(A + B\Big) = \text{trace}\Big(\begin{bmatrix}A + B &\mathbf 0\\ \mathbf 0&\mathbf 0\end{bmatrix}\Big)= \sum_{k=1}^{2n} \lambda_k\big(X\big)$

To close, for $r \in \big\{1,2,...,2n-1\big\}$ with for some Hermitian projector $P_r$ (i.e. $P_r = P_r^*$, $ P_r=P_r^2$ and $\text{rank}\big(P_r\big) = r$), since the eigenvalues of $W$ split into those of $A$ and $B$ we may write, for some $m \in \big\{0,1,..., r\big\}$
$$ \begin{align} &\sum_{k=1}^{r} \lambda_k\big(W\big) \\ &= \Big(\sum_{k=1}^{m} \lambda_k\big(A\big)\Big) + \Big(\sum_{k=1}^{r-m} \lambda_k\big(B\big)\Big)\\ &= \text{trace}\Big((\sum_{k=1}^m \lambda_{k,A}\mathbf q_k\mathbf q_k^*) +(\sum_{k=1}^{r-m} \lambda_{k,B}\mathbf v_k\mathbf v_k^*)\Big)\\ &= \text{trace}\Big(P_r\big((\sum_{k=1}^m \lambda_{k,A}\mathbf q_k\mathbf q_k^*) +(\sum_{k=1}^{r-m} \lambda_{k,B}\mathbf v_k\mathbf v_k^*)\big)\Big)\\ &\leq \text{trace}\Big(P_r\big((\sum_{k=1}^m \lambda_{k,A}\mathbf q_k\mathbf q_k^*) +(\sum_{k=1}^{r-m} \lambda_{k,B}\mathbf v_k\mathbf v_k^*)\big)\Big) + \text{trace}\Big(P_r Z_r\Big)\\ &= \text{trace}\Big(P_r\big(\big\{(\sum_{k=1}^m \lambda_{k,A}\mathbf q_k\mathbf q_k^*)+(\sum_{k=m+1}^n \lambda_{k,A}\mathbf q_k\mathbf q_k^*)\big\} +\big\{(\sum_{k=1}^{r-m} \lambda_{k,B}\mathbf v_k\mathbf v_k^*)+(\sum_{k=r-m+1}^{n} \lambda_{k,B}\mathbf v_k\mathbf v_k^*)\big\}\big)\Big)\\ &= \text{trace}\Big(P_r\big( A + B\big)\Big)\\ &=\text{trace}\left(\begin{bmatrix}P_r &\mathbf 0\\ \mathbf 0&\mathbf 0\end{bmatrix}\begin{bmatrix}A + B &\mathbf 0\\ \mathbf 0&\mathbf 0\end{bmatrix}\right)\\ &=\text{trace}\left(\begin{bmatrix}P_r &\mathbf 0\\ \mathbf 0&\mathbf 0\end{bmatrix}X\right)\\ &\leq \sum_{k=1}^r \sigma_k\big(X\big)\\ &= \sum_{k=1}^r \lambda_k\big(X\big)\\ \end{align} $$

where the inequalities are
i.) $Z_r:= (\sum_{k=m+1}^n \lambda_{k,A}\mathbf q_k\mathbf q_k^*) +(\sum_{k=r-m+1}^{n} \lambda_{k,B}\mathbf v_k\mathbf v_k^*)\succeq \mathbf 0$
(i.e. PSD matrix + PSD matrix is PSD)
so
$\text{trace}\Big(P_rZ_r\Big)= \text{trace}\Big(P_r P_r^* \big(Z_r^\frac{1}{2}\big)^*Z_r^\frac{1}{2}\Big)= \text{trace}\Big( P_r^* \big(Z_r^\frac{1}{2}\big)^*Z_r^\frac{1}{2}P_r\Big)= \Big \Vert Z_r^\frac{1}{2}P_r\Big \Vert_F^2 \geq 0$
(the squared Frobenius norm, also known as squared Schatten 2 norm, is positive definite)

ii.) von Neumann Trace Inequality