Trace minimization in a Rayleigh-quotient-like problem

407 Views Asked by At

Given an $n\times n$ real diagonal matrix $D$ and an $m\times m$ real diagonal matrix $W$ (where $n\geq m$) with $\text{tr}(W^2)=1$, consider the following optimization problem in $X \in \mathbb{R}^{n \times m}$

$$\begin{array}{ll} \text{minimize} & \text{tr}\!\left((XW)^TD\,(XW)\right)\\ \text{subject to} & X^T X = I_m\end{array}$$

In the "equal-weighted" case $W=(I_m/m)^{1/2}$, this reduces to a standard Rayleigh quotient minimization problem. But in the more general case, setting up and solving the Lagrangian is causing some difficulties.

My current approach: Rewrite the constraint as $W^TX^TXW = W^2$, then set up the Lagrangian

$$\mathcal{L}(X,\Lambda) = \text{tr}\!\left((XW)^TD\,(XW)\right) + \text{tr}\!\left(\Lambda(W^2-W^TX^TXW)\right)$$

The first term enters into the first-order condition as $2DXW^2$, but I'm having trouble differentiating the second term since this formulation doesn't appear in any of the "cookbooks."

I have a hunch that given the orthonormality constraint $X^TX = I_m$, the solution doesn't depend on the matrix $W$ (so that one solution is given by $X$ equal to the first $m$ columns of $I_n$, since these can be taken as the eigenvectors for the diagonal matrix $D$), but am not positive. Any help appreciated.

1

There are 1 best solutions below

4
On BEST ANSWER

The usual arguments apply. By padding $X$ and $W$ with zeros, you may assume that $X,D,W$ are square matrices of the same sizes. Then $$ \operatorname{tr}((XW)^TD(XW))=\operatorname{tr}(X^TDXW^2)=\sum_{i,j}d_iw_j^2x_{ij}^2 $$ is linear in the entries of the orthostochastic matrix $X\circ X$. By Birkhoff's theorem, $X\circ X$ is a convex combination of permutation matrices. Therefore the matrix trace in question is minimised when $X$ is a permutation matrix such that $x_{i\sigma(i)}=1$ for some permutation $\sigma\in S_n$. The minimisation problem thus boils down to $$ \min_{\sigma\in S_n} \sum_i d_iw_{\sigma(i)}^2. $$ Without loss of generality, we may assume that $d_1\ge d_2\ge\cdots\ge d_n$. Then a global minimiser is obviously given by any $\sigma\in S_n$ that makes $w_{\sigma(1)}^2\le w_{\sigma(2)}^2\le\cdots\le w_{\sigma(n)}^2$. Now, by extracting the first $m$ columns of $X$, one obtains a minimiser for the original minimisation problem.