I am having some difficulty understanding an argument in a book. The authors claim that the following Theorem is a direct consequence of the preceding lemma, but fail to give details. Either it is completely trivial and I am not seeing it, or there are some details missing.
Lemma:
Let $A$ be a $n\times n$ symmetric positive definite matrix over $\mathbb{R}$ with eigenvalues $\lambda_1>\lambda_2>\ldots>\lambda_n$ and associated eigenvectors $v_1,v_2,\ldots v_n$. Then we have $$ \max\{x^TAx: ||x||=1, \langle x,v_j\rangle=0 \,\,\text{for}\,\, 1\leq j\leq i-1\}=\lambda_i, $$
where the maximum is attained precisely at the points $v_i$ and $-v_i$.
Theorem: Let $p\leq n$. Consider the following optimization problem \begin{align} \max\sum_{k=1}^p &\langle Au_k,u_k\rangle\\ s.t:\,\,(u_1,\ldots u_p)&\,\,\,\text{is an orthonormal system} \end{align}
The claim is that the optimal value is $\sum_{k=1}^p \lambda_k$ with optimal solution $(v_1,v_2,\ldots,v_p)$, and that the solution is unique up to sign and permutation.
It seems to me that the optimization is carried out by successively maximizing each summand. I fail to understand why this is legitimate. What am I missing here?
Thanks
Note that the sum $\sum_{k=1}^p\langle Au_k,u_k\rangle$ only depends on the subspace $S$ spanned by the $u_k$s but not on the orthonormal system $(u_1,\ldots,u_p)$ itself. In fact, if we extend the orthonormal system to an orthonormal basis $(u_1,\ldots,u_n)$ of $\mathbb R^n$ and denote by $P_S$ the orthogonal projector onto $S$, then $$ \sum_{k=1}^p\langle Au_k,u_k\rangle =\sum_{k=1}^n\langle AP_Su_k,P_Su_k\rangle =\operatorname{tr}(U^TP_S^TAP_SU) =\operatorname{tr}(AP_S) $$ and the last expression above depends only on $S$ rather than on any basis of $S$.
Therefore, as pointed out by a user in a comment, the uniqueness claim in the theorem is wrong.
That said, the theorem does follow from the lemma more or less directly. It clearly follows from the lemma if $p=1$. When $p>1$, let $S$ be a maximiser of $\operatorname{tr}(AP_S)$. If $v_1\not\in S$, $S$ must contain a $(p-1)$-dimensional subspace $S'$ that is orthogonal to $v_1$ (if $v_1\perp S$, simply pick any $(p-1)$-dimensional subspace of $S$; otherwise, let $w\ne0$ be the orthogonal projection of $v_1$ onto $S$ and take $S'$ as the orthogonal complement of $\operatorname{span}(w)$ in $S$). But then by the lemma, $\operatorname{span}(v_1)+S'$ would be a better solution than $S$, which is a contradiction.
Thus $v_1$ must lie inside $S$ and $\operatorname{tr}(AP_S)=v_1^TAv_1+\operatorname{tr}(AP_{S'})$. The dimension of the problem is now reduced by one, and by the lemma, $u^TAu\le v_2^TAv_2$ for every $u\perp v_1$ and in particular for every $u\in S'$. By a similar argument to the above, we conclude that $v_2$ must lie inside the optimal $S'$. Proceed recursively, we see that the optimal $S$ is given by the span of $v_1,\ldots,v_p$.