Maximizing the trace in an elegant way

Question

Maximizing the trace in an elegant way

2.5k Views Asked by Bumbble Comm At 02 Apr 2026 - 1:27

Suppose $\Lambda = {\rm diag}(\lambda_1,\cdots,\lambda_n)$, $\Sigma = {\rm diag}(\sigma_1,\cdots,\sigma_n)$, and we have $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0$ and $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0$. I want to prove that $${\rm max}[ {\rm Tr}(O^{\rm T}\Lambda O \Sigma)] = \sum_{i=1}^n \lambda_i \sigma_i,$$ where $OO^{\rm T} = I$. One approach is to maximize the function $$f(o_{ij}) = \sum_{i,j} \lambda_i (o_{ij})^2 \sigma_j$$ under the constraint $\sum_{k} o_{ik}o_{jk} = \delta_{ij}$ by Lagrange multiplier method. I can prove the statement in this way but the whole proof is lengthy and lack of the flavor of linear algebra. Since the statement "must be true" intuitively, I want to know if there is an elegant way to prove it. A possible way may be mathematical induction but I failed to make it. Any help is appreciated.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 09 Dec 2017 - 10:49

This is not an answer.

My contribution is here to show that the issue is equivalent to a minimization of a certain difference (in the sense of Frobenius norm) still over all orthogonal matrices in a seemingly more general context. I would have like that this equivalent permits to establish the result, but I haven't succeeded.

$$\begin{eqnarray}\tag{*}M&=&\min_{O \in O(n)}\|O^{\rm T}\Lambda O - \Sigma \|_F^2\\&=&\min_{O \in O(n)}{\rm Tr}[(O^{\rm T}\Lambda O - \Sigma)^T(O^{\rm T}\Lambda O - \Sigma)]\\&=&\min_{O \in O(n)}{\rm Tr}[\Lambda^2+\Sigma^2-(O^{\rm T}\Lambda O \Sigma)-(\Sigma O^{\rm T}\Lambda O)]\\&=&\min_{O \in O(n)}\left\{{\rm Tr}[\Lambda^2+\Sigma^2]-{\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]-{\rm Tr}[(\Sigma O^{\rm T}\Lambda O)]\right\}\\\tag{1}&=&\min_{O \in O(n)}\left\{{\rm Tr}[\Lambda^2+\Sigma^2]-2 \ {\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]\right\}\end{eqnarray}$$

Explanation: ${\rm Tr}[(\Sigma O^{\rm T}\Lambda O)]= {\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]$ because of the following "circulant property" of trace operator:

$$ {\rm Tr}[ABCD]= {\rm Tr}[BCDA]$$

Thus, because of the minus sign in (1), and the fact that ${\rm Tr}[\Lambda^2+\Sigma^2]$ is a fixed quantity, the minimum in (1) is indeed attained for an orthogonal matrix $O$ such that

$${\rm Tr}(O^{\rm T}\Lambda O \Sigma) \ \ \text{is maximal}$$

Thus the equivalence betwen (*) and our initial problem is established.

Bumbble Comm On 09 Dec 2017 - 4:29

It seems that the answer given by user1551 is the most appropriate one. The approach using calculus is in detailed in his answer. But another approach using Birkhoff theorem suggested in his comments is rough. I read through the linked answer given by him and try to give an informal summary here to help those interested.

Double stochastic matrix and Birkhoff–von Neumann theorem

Double stochastic matrix is a square matrix $A=(a_{ij})$ of nonnegative real numbers, each of whose rows and columns sums to 1, i.e. $$\sum _{i}a_{ij}=\sum _{j}a_{ij}=1.$$ The Birkhoff–von Neumann theorem states that if $A$ is doubly stochastic matrix, then there exist $\theta _{1},\ldots ,\theta _{k}\in (0,1)$ and $\theta _{1} + \ldots + \theta _{k} = 1$ and permutation matrices $P_{1},\ldots ,P_{k}$ such that $$A=\theta _{1}P_{1}+\cdots +\theta _{k}P_{k}.$$

The proof of the Birkhoff–von Neumann theorem can be found here. It applys the Hall's marriage theorem in graph thoery. The proof of the Hall's marriage theorem can be found here. I have never learned any graph theory before, but I find that the proof given by the link is not hard to follow if you have learned linear algebra. And the proof is also very inspiring and ingenious!

Proof of the statement by Birkhoff–von Neumann theorem

If $OO^{\rm T} = I$, we define $Q \equiv (q_{ij}) = (o_{ij})^2$. It is easy to find that $Q$ is double stochastic matrix. Since $${\rm Tr}(O^{\rm T}\Lambda O\Sigma) = \sum_{i,j}\lambda_i \sigma_j (o_{ij})^2 = \sum_{i,j}\lambda_i \sigma_j q_{ij},$$ we have $$\max_{OO^{\rm T} = I}{\rm Tr}(O^{\rm T}\Lambda O\Sigma) \leq \max_{A} \sum_{i,j}^n\lambda_i \sigma_j a_{ij}, A \mbox{ is double stochastic} .$$

In light of Birkhoff–von Neumann theorem, we have $$A=\sum_{l=1}^{k}\theta_l P_{l},$$ where $\theta _{1},\ldots ,\theta _{k}\in (0,1)$ and $\theta _{1} + \ldots + \theta _{k} = 1$. So we have $$\sum_{i,j}\lambda_i \sigma_j a_{ij} = \sum_{l=1}^{k} \theta_{l} \sum_{i,j}\lambda_i \sigma_j p_{ij} .$$ Among all $n!$ permutation matrices, it is easy to check $\sum_{i,j}\lambda_i \sigma_j p_{ij}$ has maximum value $\sum_{i}\lambda_i \sigma_i$ when $p_{ij} = \delta_{ij}$. So we have $$\max_{A} \sum_{i,j}\lambda_i \sigma_j a_{ij} = \sum_{i}\lambda_i \sigma_i.$$ On the other hand, we have ${\rm Tr}(O^{\rm T}\Lambda O\Sigma) = \sum_{i}\lambda_i \sigma_i$ when $O = I$, so we can conclude that $$\max_{OO^{\rm T} = I}{\rm Tr}(O^{\rm T}\Lambda O\Sigma) = \sum_{i}\lambda_i \sigma_i.$$

Thanks a lot to user 1551 for showing me these exciting new knowledge!

**Bumbble Comm** · Accepted Answer

(Rewritten from the last part of my answer to Trace minimization with constraints .)

Since the objective function is continuous and the set of all real orthogonal matrices is compact, the maximum trace is continuous in the entries of $\Sigma$. Therefore, by passing to limit, we may assume without loss of generality that $\Sigma$ has distinct diagonal entries.

The matrix exponential of every skew-symmetric matrix is a real orthogonal matrix of determinant one. If $O$ is a global maximiser in your problem, its objective function value must be greater than or equal to the objective function value evaluated at $Oe^{tK}$ (which is real orthogonal) for any real number $t$ and any skew-symmetric $K$. So, if we define $f(t)=\operatorname{tr}(e^{-tK}O^T\Lambda Oe^{tK}\Sigma)$, we must have $f'(0)=0$. Using the cyclic property $\operatorname{tr}(XY)=\operatorname{tr}(YX)$, we get $$ 0=f'(0)=\operatorname{tr}\left((\Sigma O^T\Lambda O-O^T\Lambda O\Sigma)\, K\right). $$ Put $K^T=\Sigma O^T\Lambda O-O^T\Lambda O\Sigma$ (which is indeed skew-symmetric), we get $\operatorname{tr}(K^TK)=0$. Hence $K=0$, meaning that $O^T\Lambda O$ commutes with $\Sigma$. Any matrix that commutes with a diagonal matrix with distinct diagonal entries must have all off-diagonal entries equal to zero. Therefore $O^T\Lambda O$ is a diagonal matrix.

Hence $\Lambda_1:=O^T\Lambda O$ is a diagonal permutation of $\Lambda$, because $\Lambda_1$ and $\Lambda$ are diagonal matrices with the same eigenvalues. Now the problem boils down to finding the diagonal permutation $\Lambda_1$ of $\Lambda$ that maximises $\operatorname{tr}(\Lambda_1\Sigma)$. Obviously, maximum occurs when $\Lambda_1=\Lambda$ or when $O=I$.

Maximizing the trace in an elegant way

There are 3 best solutions below

Double stochastic matrix and Birkhoff–von Neumann theorem

Proof of the statement by Birkhoff–von Neumann theorem

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in OPTIMIZATION

Related Questions in TRACE

Trending Questions

Popular # Hahtags

Popular Questions