I'm learning PCA and I found the following optimization problem in pages 9 and 13 of Afonso Bandeira's lecture notes.
$$\begin{array}{ll} \underset{V \in \mathbb{R}^{n \times d}}{\text{maximize}} & \mbox{tr} \left(V^T \Sigma V \right)\\ \text{subject to} & V^T V = I_{d}\end{array}$$
where $\Sigma$ is the covariance matrix.
The solution is the first $d$ eigenvectors of $\Sigma$, but I don't know how to get this solution. I tried using Lagrange multipliers but failed to get the solution here.
Write $\Sigma=XDX^t$ where $X$ is orthonormal and $D$ is diagonal with non-negative entries.
We want to maximize $tr(V^tXDX^tV)$. Consider the transformation $W=X^tV$ and ovserve that $W^tW=V^tXX^tV=V^tV=I$. Since $X^t$ is an invertible matrix, this defines an invertible transofmration on the space of allowable $V$s, so the original optimization problem is equivalent to
$max Tr(W^tDW), W^tW=I_d$
On the other hand, $Tr(W^tDW)=Tr(DWW^T)=\sum_i d_i (WW^T)_{ii}$.
Lemma
$0\leq (WW^T)_{ii}\leq 1$.
Proof of lemma
The first inequality is clear, because $(WW^T)_{ii}$ is the squared norm of the $i$th row of $W$. To establish the second, observe that for any matrix $M$, the norm of any column of $M$ is bounded by the largest singular value of $M$. This follows immediately from the characterization $\sigma_1(M)=\sup_{|v|=1} |Mv|$, and noting that the $i$th column is given by $Me_i$, where $e_i$ is a standard basis vector. Furthermore, it is a general fact that the singular values of $M$ are the square roots of the eigenvalues of $MM^T$. In particular, since $W^tW=I$, we conclude that all singular values of $W^t$ are equal to 1, and consequently the norm of each column of $W^t$ is bounded by 1.
(end proof of lemma)
Given the constraints on $(WW^T)_{ii}$ it is clear that $\sum_i d_i (WW^T)_{ii}$ is maximized when $(WW^T)_{ii}=1$ if if $i\leq k$ and $0$ if not (we assume WLOG that the entires of $D$ are ordered from largest to smallest). This can be attained by setting the $i$th column of $W$ to be $e_i$ if $i\leq k$ and $0$ if $i>k$. Finally, remembering that $W=X^tV$ where $X$ is the matrix of eigenvalues of $\Sigma$, we see that $V$ consists precisely of the top $k$ eigenvectors of $\Sigma$.