A question about the optimization problem $\max_{U\in\mathbb{R}^{n\times k}}\mathrm{trace}(U^TAU)$ subject to $U^TU=I_k$.

231 Views Asked by At

Let $A\in\mathbb{R}^{n\times n}$ be a symmetric semi-positive matrix. Consider the following optimization problem: $$ \max_{U\in\mathbb{R}^{n\times k}}\mathrm{trace}(U^TAU)\qquad\mathrm{subject\quad to}\qquad U^TU=I_k, $$ where $I_k$ is the identity matrix of size $k$ and $k\leq n$.

I am going to solve the above-mentioned problem. In addition, I guess that the optimal solution is $U_{opt}=[u_1,u_2,\ldots,u_k]$ in which $u_1,u_2,\ldots,u_k$ are the eigenvectors of $A$ corresponding to the $k$ largest eigenvalues of $A$. Is that true? How can I prove this?

Thanks in advance!

2

There are 2 best solutions below

1
On BEST ANSWER

You can write $A=V^\top DV$ for some orthogonal matrix $V$ and for some diagonal matrix $D=\text{diag}(\lambda_1,\lambda_2,\ldots,\lambda_n)$ with $\lambda_1\geq\lambda_2\geq \ldots\geq \lambda_n\geq 0$. If $X:=VU$, then $X^\top X=I_k$. Therefore, $$S:=\text{trace}\left(U^\top A U\right)=\text{trace}\left(X^\top D X\right)=\text{trace}\left(D XX^\top\right)=\sum_{i=1}^n\,\lambda_i\,\sum_{j=1}^k\,x_{i,j}^2\,,\tag{*}$$ if $X=\left[x_{i,j}\right]_{i\in\{1,2,\ldots,n\},j\in\{1,2,\ldots,k\}}$. Set $\lambda_{n+1}:=0$. Rewrite (*) as $$S=\sum_{i=1}^n\left(\lambda_i-\lambda_{i+1}\right)\,\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2\,.\tag{#}$$

Note that you can find an $n$-by-$(n-k)$ matrix $Y=\left[y_{i,j}\right]_{i\in\{1,2,\ldots,n\},j\in\{1,2,\ldots,n-k\}}$ such that $Z:=[X\,|\,Y]$ is an $n$-by-$n$ orthogonal matrix (by extending the column vectors of $X$ to a basis of $\mathbb{R}^n$ and then apply the Gram-Schmidt algorithm). Therefore, for $l=1,2,\ldots,n$, $$\sum_{j=1}^k\,x_{l,j}^2\leq \sum_{j=1}^k\,x_{l,j}^2+\sum_{j=1}^{n-k}\,y_{l,j}^2$$ is the $(l,l)$-entry of $ZZ^\top$, but $ZZ^\top=Z^\top Z=I_n$. Thus, $$\sum_{j=1}^k\,x_{l,j}^2\leq 1\text{ for any }l=1,2,\ldots,n\,.\tag{1}$$ Furthermore, $$\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2\leq \sum_{l=1}^n\,\sum_{j=1}^k\,x_{l,j}^2=\text{trace}(XX^\top)=\text{trace}(X^\top X)=\text{trace}(I_k)=k\,.\tag{2}$$

From (#), we obtain $$S\leq \sum_{i=1}^k\,\left(\lambda_i-\lambda_{i+1}\right)\,\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2+\sum_{i=k+1}^n\,\left(\lambda_i-\lambda_{i+1}\right)\,\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2\,.$$ Using (1) and (2) in the inequality above yields $$S\leq \sum_{i=1}^k\,\left(\lambda_i-\lambda_{i+1}\right)\,i+\sum_{i=k+1}^n\,\left(\lambda_{i}-\lambda_{i+1}\right)\,k=\sum_{i=1}^k\,\lambda_i\,.$$

Note that the equality is achievable when the only nonzero entries of $X$ are $x_{i,i}=1$ for $i=1,2,\ldots,k$. That is, $S=\sum\limits_{i=1}^k\,\lambda_i$ if $U=\left[u_1\,|\,u_2\,|\,\ldots\,|\,u_k\right]$, where $u_i$ is an eigenvector of $A$ corresponding to the eigenvalue $\lambda_i$ for $i=1,2,\ldots,k$, and $\{u_1,u_2,\ldots,u_k\}$ is an orthonormal subset of $\mathbb{R}^n$.

1
On

You are mostly correct- The optimum value is the sum of the $k$ largest eigenvalues of $A$. The optimal $U$ is not unique and any $U$ which maps onto the span of the eigenvectors corresponding to these eigenvalues achieves the optimum.

A not very clever way to show this follows.

Since $A$ is symmetric, it is unitarily diagonalizable. Also, if you multiple an isometry on the left by a unitary you get another isometry, so we might as well assume $A$ is diagonal.

Write $$ A=\begin{pmatrix} \Lambda_1 & 0 \\ 0 & \Lambda_2 \end{pmatrix} $$ where $\Lambda_1$ is a $k \times k$ diagonal matrix which has the $k$ largest eigenvalues of $A$ on is diagonal, and $\Lambda_2$ is a diagonal $n-k \times n-k$ matrix which has the remaining eigenvalues of $A$ as if diagonal entries. Then we have $$ A \preceq \begin{pmatrix} \Lambda_1 & 0 \\ 0 & \lambda_k I_{n-k} \end{pmatrix} $$ where $\lambda_k$ is the $k$th largest eigenvalue of $A$. It follows that $$ Tr(U^T A U) \leq Tr \left(U^T \begin{pmatrix} \Lambda_1 & 0 \\ 0 & \lambda_k I_{n-k} \end{pmatrix} U \right)=:(*)$$ for any isometry $U$.

Write $U^T=(U_1^T \ U_2^T)$ subordinate to the block decomposition of $A$, and notice that $U_2^T U_2= I_k- U_1^T U$. Then then the right hand side of the above inequality simplifies as follows. $$ \begin{array}{rclcl} (*)&=&Tr(U_1^T \Lambda_1 U_1)+Tr(\lambda_k (I_k-U_1^T U_1))=Tr(U_1^T (\Lambda_1-\lambda_k I)U_k)\\ & &+Tr(\lambda_k I) \leq Tr(\Lambda_1-\lambda_k I)+Tr(\lambda_k I)=Tr(\Lambda_1). \end{array} $$ In the inquality I have used that $\Lambda_1-\lambda_k I$ is PSD and that $U_1$ is a contraction.

This shows that the sum of the $k$ largest eigenvalues is an upperbound for the optimal value. The proof is completed by noting that this value is achieved so long as you take $U_1$ to be a unitary and $U_2=0$.