Suppose $\Lambda = {\rm diag}(\lambda_1,\cdots,\lambda_n)$, $\Sigma = {\rm diag}(\sigma_1,\cdots,\sigma_n)$, and we have $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0$ and $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0$. I want to prove that $${\rm max}[ {\rm Tr}(O^{\rm T}\Lambda O \Sigma)] = \sum_{i=1}^n \lambda_i \sigma_i,$$ where $OO^{\rm T} = I$. One approach is to maximize the function $$f(o_{ij}) = \sum_{i,j} \lambda_i (o_{ij})^2 \sigma_j$$ under the constraint $\sum_{k} o_{ik}o_{jk} = \delta_{ij}$ by Lagrange multiplier method. I can prove the statement in this way but the whole proof is lengthy and lack of the flavor of linear algebra. Since the statement "must be true" intuitively, I want to know if there is an elegant way to prove it. A possible way may be mathematical induction but I failed to make it. Any help is appreciated.
Maximizing the trace in an elegant way
2.5k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
This is not an answer.
My contribution is here to show that the issue is equivalent to a minimization of a certain difference (in the sense of Frobenius norm) still over all orthogonal matrices in a seemingly more general context. I would have like that this equivalent permits to establish the result, but I haven't succeeded.
$$\begin{eqnarray}\tag{*}M&=&\min_{O \in O(n)}\|O^{\rm T}\Lambda O - \Sigma \|_F^2\\&=&\min_{O \in O(n)}{\rm Tr}[(O^{\rm T}\Lambda O - \Sigma)^T(O^{\rm T}\Lambda O - \Sigma)]\\&=&\min_{O \in O(n)}{\rm Tr}[\Lambda^2+\Sigma^2-(O^{\rm T}\Lambda O \Sigma)-(\Sigma O^{\rm T}\Lambda O)]\\&=&\min_{O \in O(n)}\left\{{\rm Tr}[\Lambda^2+\Sigma^2]-{\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]-{\rm Tr}[(\Sigma O^{\rm T}\Lambda O)]\right\}\\\tag{1}&=&\min_{O \in O(n)}\left\{{\rm Tr}[\Lambda^2+\Sigma^2]-2 \ {\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]\right\}\end{eqnarray}$$
Explanation: ${\rm Tr}[(\Sigma O^{\rm T}\Lambda O)]= {\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]$ because of the following "circulant property" of trace operator:
$$ {\rm Tr}[ABCD]= {\rm Tr}[BCDA]$$
Thus, because of the minus sign in (1), and the fact that ${\rm Tr}[\Lambda^2+\Sigma^2]$ is a fixed quantity, the minimum in (1) is indeed attained for an orthogonal matrix $O$ such that
$${\rm Tr}(O^{\rm T}\Lambda O \Sigma) \ \ \text{is maximal}$$
Thus the equivalence betwen (*) and our initial problem is established.
On
It seems that the answer given by user1551 is the most appropriate one. The approach using calculus is in detailed in his answer. But another approach using Birkhoff theorem suggested in his comments is rough. I read through the linked answer given by him and try to give an informal summary here to help those interested.
Double stochastic matrix and Birkhoff–von Neumann theorem
Double stochastic matrix is a square matrix $A=(a_{ij})$ of nonnegative real numbers, each of whose rows and columns sums to 1, i.e. $$\sum _{i}a_{ij}=\sum _{j}a_{ij}=1.$$ The Birkhoff–von Neumann theorem states that if $A$ is doubly stochastic matrix, then there exist $\theta _{1},\ldots ,\theta _{k}\in (0,1)$ and $\theta _{1} + \ldots + \theta _{k} = 1$ and permutation matrices $P_{1},\ldots ,P_{k}$ such that $$A=\theta _{1}P_{1}+\cdots +\theta _{k}P_{k}.$$
The proof of the Birkhoff–von Neumann theorem can be found here. It applys the Hall's marriage theorem in graph thoery. The proof of the Hall's marriage theorem can be found here. I have never learned any graph theory before, but I find that the proof given by the link is not hard to follow if you have learned linear algebra. And the proof is also very inspiring and ingenious!
Proof of the statement by Birkhoff–von Neumann theorem
If $OO^{\rm T} = I$, we define $Q \equiv (q_{ij}) = (o_{ij})^2$. It is easy to find that $Q$ is double stochastic matrix. Since $${\rm Tr}(O^{\rm T}\Lambda O\Sigma) = \sum_{i,j}\lambda_i \sigma_j (o_{ij})^2 = \sum_{i,j}\lambda_i \sigma_j q_{ij},$$ we have $$\max_{OO^{\rm T} = I}{\rm Tr}(O^{\rm T}\Lambda O\Sigma) \leq \max_{A} \sum_{i,j}^n\lambda_i \sigma_j a_{ij}, A \mbox{ is double stochastic} .$$
In light of Birkhoff–von Neumann theorem, we have $$A=\sum_{l=1}^{k}\theta_l P_{l},$$ where $\theta _{1},\ldots ,\theta _{k}\in (0,1)$ and $\theta _{1} + \ldots + \theta _{k} = 1$. So we have $$\sum_{i,j}\lambda_i \sigma_j a_{ij} = \sum_{l=1}^{k} \theta_{l} \sum_{i,j}\lambda_i \sigma_j p_{ij} .$$ Among all $n!$ permutation matrices, it is easy to check $\sum_{i,j}\lambda_i \sigma_j p_{ij}$ has maximum value $\sum_{i}\lambda_i \sigma_i$ when $p_{ij} = \delta_{ij}$. So we have $$\max_{A} \sum_{i,j}\lambda_i \sigma_j a_{ij} = \sum_{i}\lambda_i \sigma_i.$$ On the other hand, we have ${\rm Tr}(O^{\rm T}\Lambda O\Sigma) = \sum_{i}\lambda_i \sigma_i$ when $O = I$, so we can conclude that $$\max_{OO^{\rm T} = I}{\rm Tr}(O^{\rm T}\Lambda O\Sigma) = \sum_{i}\lambda_i \sigma_i.$$
Thanks a lot to user 1551 for showing me these exciting new knowledge!
(Rewritten from the last part of my answer to Trace minimization with constraints .)
Since the objective function is continuous and the set of all real orthogonal matrices is compact, the maximum trace is continuous in the entries of $\Sigma$. Therefore, by passing to limit, we may assume without loss of generality that $\Sigma$ has distinct diagonal entries.
The matrix exponential of every skew-symmetric matrix is a real orthogonal matrix of determinant one. If $O$ is a global maximiser in your problem, its objective function value must be greater than or equal to the objective function value evaluated at $Oe^{tK}$ (which is real orthogonal) for any real number $t$ and any skew-symmetric $K$. So, if we define $f(t)=\operatorname{tr}(e^{-tK}O^T\Lambda Oe^{tK}\Sigma)$, we must have $f'(0)=0$. Using the cyclic property $\operatorname{tr}(XY)=\operatorname{tr}(YX)$, we get $$ 0=f'(0)=\operatorname{tr}\left((\Sigma O^T\Lambda O-O^T\Lambda O\Sigma)\, K\right). $$ Put $K^T=\Sigma O^T\Lambda O-O^T\Lambda O\Sigma$ (which is indeed skew-symmetric), we get $\operatorname{tr}(K^TK)=0$. Hence $K=0$, meaning that $O^T\Lambda O$ commutes with $\Sigma$. Any matrix that commutes with a diagonal matrix with distinct diagonal entries must have all off-diagonal entries equal to zero. Therefore $O^T\Lambda O$ is a diagonal matrix.
Hence $\Lambda_1:=O^T\Lambda O$ is a diagonal permutation of $\Lambda$, because $\Lambda_1$ and $\Lambda$ are diagonal matrices with the same eigenvalues. Now the problem boils down to finding the diagonal permutation $\Lambda_1$ of $\Lambda$ that maximises $\operatorname{tr}(\Lambda_1\Sigma)$. Obviously, maximum occurs when $\Lambda_1=\Lambda$ or when $O=I$.