Let $A\in\mathbb{R}^{n\times n}$ be a symmetric semi-positive matrix. Consider the following optimization problem: $$ \max_{U\in\mathbb{R}^{n\times k}}\mathrm{trace}(U^TAU)\qquad\mathrm{subject\quad to}\qquad U^TU=I_k, $$ where $I_k$ is the identity matrix of size $k$ and $k\leq n$.
I am going to solve the above-mentioned problem. In addition, I guess that the optimal solution is $U_{opt}=[u_1,u_2,\ldots,u_k]$ in which $u_1,u_2,\ldots,u_k$ are the eigenvectors of $A$ corresponding to the $k$ largest eigenvalues of $A$. Is that true? How can I prove this?
Thanks in advance!
You can write $A=V^\top DV$ for some orthogonal matrix $V$ and for some diagonal matrix $D=\text{diag}(\lambda_1,\lambda_2,\ldots,\lambda_n)$ with $\lambda_1\geq\lambda_2\geq \ldots\geq \lambda_n\geq 0$. If $X:=VU$, then $X^\top X=I_k$. Therefore, $$S:=\text{trace}\left(U^\top A U\right)=\text{trace}\left(X^\top D X\right)=\text{trace}\left(D XX^\top\right)=\sum_{i=1}^n\,\lambda_i\,\sum_{j=1}^k\,x_{i,j}^2\,,\tag{*}$$ if $X=\left[x_{i,j}\right]_{i\in\{1,2,\ldots,n\},j\in\{1,2,\ldots,k\}}$. Set $\lambda_{n+1}:=0$. Rewrite (*) as $$S=\sum_{i=1}^n\left(\lambda_i-\lambda_{i+1}\right)\,\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2\,.\tag{#}$$
Note that you can find an $n$-by-$(n-k)$ matrix $Y=\left[y_{i,j}\right]_{i\in\{1,2,\ldots,n\},j\in\{1,2,\ldots,n-k\}}$ such that $Z:=[X\,|\,Y]$ is an $n$-by-$n$ orthogonal matrix (by extending the column vectors of $X$ to a basis of $\mathbb{R}^n$ and then apply the Gram-Schmidt algorithm). Therefore, for $l=1,2,\ldots,n$, $$\sum_{j=1}^k\,x_{l,j}^2\leq \sum_{j=1}^k\,x_{l,j}^2+\sum_{j=1}^{n-k}\,y_{l,j}^2$$ is the $(l,l)$-entry of $ZZ^\top$, but $ZZ^\top=Z^\top Z=I_n$. Thus, $$\sum_{j=1}^k\,x_{l,j}^2\leq 1\text{ for any }l=1,2,\ldots,n\,.\tag{1}$$ Furthermore, $$\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2\leq \sum_{l=1}^n\,\sum_{j=1}^k\,x_{l,j}^2=\text{trace}(XX^\top)=\text{trace}(X^\top X)=\text{trace}(I_k)=k\,.\tag{2}$$
From (#), we obtain $$S\leq \sum_{i=1}^k\,\left(\lambda_i-\lambda_{i+1}\right)\,\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2+\sum_{i=k+1}^n\,\left(\lambda_i-\lambda_{i+1}\right)\,\sum_{l=1}^i\,\sum_{j=1}^k\,x_{l,j}^2\,.$$ Using (1) and (2) in the inequality above yields $$S\leq \sum_{i=1}^k\,\left(\lambda_i-\lambda_{i+1}\right)\,i+\sum_{i=k+1}^n\,\left(\lambda_{i}-\lambda_{i+1}\right)\,k=\sum_{i=1}^k\,\lambda_i\,.$$
Note that the equality is achievable when the only nonzero entries of $X$ are $x_{i,i}=1$ for $i=1,2,\ldots,k$. That is, $S=\sum\limits_{i=1}^k\,\lambda_i$ if $U=\left[u_1\,|\,u_2\,|\,\ldots\,|\,u_k\right]$, where $u_i$ is an eigenvector of $A$ corresponding to the eigenvalue $\lambda_i$ for $i=1,2,\ldots,k$, and $\{u_1,u_2,\ldots,u_k\}$ is an orthonormal subset of $\mathbb{R}^n$.