Let $A \in \mathbb{R}^{n \times n}$ be symmetric with eigenvalues $\lambda_1 \geq ... \geq \lambda_n$. Then by Courant-Fischer $\lambda_1=\max_{x\neq 0} r(x)$ and $\lambda_n=\min_{x \neq 0} r(x)$, where $r(x)=\frac{x^TAx}{x^Tx}$.
Then we consider the range $R(Q_k)$ with $Q_k=[q_1,...,q_k]\in \mathbb{R}^{n\times k}$ and $M_k := \max_{x \in R(Q_k)\setminus \{ 0 \} } r(x)$. We assume that $q_1,...,q_k$ are already chosen and that $u_k \in Span\{q_1,q_k\}$ such that $M_k=r(u_k)$. We already know that the gradient is given by
grad$(r(x))=\frac{2}{x^Tx}(Ax-r(x)x)$.
My question is now why does $M_{k+1}>M_k$ hold, if grad$(r(u_k))\neq 0$ and
grad$(r(u_k)) \in Span\{{q_1,...,q_{k+1}}\}$?
It is easily to see that grad$(r(x)) \in Span\{x,Ax\}$. My second question is why does the condition
grad$(r(u_k)) \in Span\{{q_1,...,q_{k+1}}\}$ hold if $Span\{q_1,...,q_{k+1}\}=Span\{q_1,Aq_1,A^2q_1,...,A^kq_1\}$?