Prove that if $M \in \mathbb R^{n \times n}$ is symmetric and $d \leq n$ then:
$$\max_{\substack{U \in \mathbb R^{n \times d} \\ U^T U = I_{d \times d}}} \operatorname{Tr}(U^T MU)= \sum_{k=1}^{d} \lambda_k^{(+)}(M)$$
where $\lambda_k^{(+)}$ is the kth largest eigenvalue of $M$ (i.e., $\lambda_1^{(+)} \geq \lambda_2^{(+)} \geq \dots$).
How can I solve this question?
$M$ being symmetric with real entries, there is an orthogonal matrix $P$ such that $N:=PMP^T$ is a diagonal matrix.
Note that both sides of the demanded equality assume the same value when we replace $M$ by $N$ (using the change of variables $U=PV$ for the left-hand side, and using the fact that $M$ and $N$ have the same spectrum for the right-hand side) therefore it is enough to prove the equality for diagonal matrices. Also note that with the help of a permutation matrix you can also assume that the diagonal entries are sorted from the highest to the lowest.
Now if $M$ is diagonal, with entries decreasing from the upper left ($\lambda_1$) to the lower right ($\lambda_n$), then the product $MU$ is the matrix $U$ where the $i$-th row has been multiplied by $\lambda_i$, and so for every $i$.
When you consider the $i$-th diagonal entry of $U^TMU$, it appears to be an affine combination of the eigenvalues of $M$, with weights given by the squares of the entries of the $i$-th column of $U$. These squares do add up to $1$ by assumption ($U^TU=\Bbb 1$), which allows us to speak about an affine combination.
This explains why individually, no diagonal entry of $U^TMU$ can be bigger than $\lambda_1$, and incidentally solves the case $d=1$. Now when $d>1$, there is a little trick to make use of the fact that the columns of $U$ are orthogonal. Let $U=(u_{ij})$. Let us rewrite $$\operatorname{Tr}(U^TMU)=\sum_{i=1}^n\lambda_i\sum_{j=1}^du_{ij}^2$$ as $$\operatorname{Tr}(U^TMU)=\sum_{i=1}^n\lambda_{d+1}\sum_{j=1}^du_{ij}^2+\sum_{i=1}^n(\lambda_i-\lambda_{d+1})\sum_{j=1}^du_{ij}^2$$ The first term is equal to $$\begin{array}[rcl] .\sum_{i=1}^n\lambda_{d+1}\sum_{j=1}^du_{ij}^2&=&\sum_{i=1}^n\lambda_{d+1}\sum_{j=1}^du_{ij}^2\\&=&\lambda_{d+1}\sum_{j=1}^d\sum_{i=1}^nu_{ij}^2\\&=&\lambda_{d+1}\sum_{j=1}^d1\\ &=&d\lambda_{d+1}\end{array}$$ The second term is equal to $$\begin{array}[rcl] .\sum_{i=1}^n(\lambda_i-\lambda_{d+1})\sum_{j=1}^du_{ij}^2&=&\left(\sum_{i=1}^d(\lambda_i-\lambda_{d+1})\sum_{j=1}^du_{ij}^2\right)+\left(\underset{\text{ since $\lambda_i\leq\lambda_{d+1}$ when $i\geq d+1$}}{\text{something non-positive}}\right)\\&\leq &\sum_{i=1}^d(\lambda_i-\lambda_{d+1})\sum_{j=1}^du_{ij}^2\end{array}$$
Now the trick is that since the columns of $U$ are unit and pairwise orthogonal, $U$ can be completed into an orthogonal matrix, and being orthogonal works both ways: the rows of the completed $U$ will be unit vectors. Therefore, for all $i$: $$\sum_{j=1}^du_{ij}^2\leq 1$$ and it follows that our second term is less than or equal to: $$\sum_{i=1}^d(\lambda_i-\lambda_{d+1})=\left(\sum_{i=1}^d \lambda_i\right)-d\lambda_{d+1}$$
Combining both terms:
$$\operatorname{Tr}(U^TMU)\leq \sum_{i=1}^d \lambda_i$$ as required.
The fact that the maximum value over all $U$ is exactly $\sum_{i=1}^d \lambda_i$ is clear, since all you need to reach it is to choose $U$ to be the $d\times d$ identity matrix stacked on top of rows of $0$'s.