The basic idea behind the derivation of the steepest direction is to find $\textbf{d}$ such that $f(\textbf{x}^{(k)} + \lambda \textbf{d})$ decreases the most for small values of $\lambda$
In other words, finding $\textbf{d} \in \mathbb{S}^{n-1}$ that minimizes $$ \frac{d}{d\lambda}f(\textbf{x}^{(k)} + \lambda \textbf{d})|_{\lambda = 0} = \langle \nabla f(\textbf{x}^{(k)}, \textbf{d})\rangle \qquad (1) $$ Now using the Cauchy-Schwarz Inequality, we get $$ |\langle\nabla f(\textbf{x}^{(k)},\textbf{d} \rangle|^2 \leq \|\nabla f(\textbf{x}^{(k)})\|^2 \qquad (2) $$ with equality in case $\textbf{d}$ lies on the line spanned by $\nabla f(\textbf{x})^{(k)}$, so that $$ \textbf{d} = -\frac{1}{\|\nabla f(\textbf{x}^{(k)})\|}\nabla f(\textbf{x}^{(k)}) \qquad (3) $$ delivers the steepest direction
Here are my questions:
The Cauchy-Schwarz Inequality states that $$ \boxed{| \textbf{u}^\top \textbf{v} | \leq \| \textbf{u}\| \|\textbf{v}\|} $$ How can I get (2) with it?
Now tackling the way we get to (3) we can start from (2) \begin{align*} |\langle\nabla f(\textbf{x}^{(k)},\textbf{d} \rangle|^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2 \\ ((\nabla f(\textbf{x}^{(k)})^\top\textbf{d})^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \end{align*}
But then again I'm stuck
EDIT So by setting $\textbf{u} = \nabla f(x^{(k)}$ and $\textbf{v} = d$ with $\|\textbf{d}\| = 1$ as it is $\in \mathbb{S}^{n-1}$, we can plug those 2 values into the CS inequality which gives
\begin{align*} \langle \nabla f(\textbf{x}^{(k)}), \textbf{d} \rangle &\leq \|\nabla f(\textbf{x}^{(k)})\| \|\textbf{d}\|\\ \langle \nabla f(\textbf{x}^{(k)}), \textbf{d} \rangle &\leq \|\nabla f(\textbf{x}^{(k)})\| (1)\\ |\langle \nabla f(\textbf{x}^{(k)}), \textbf{d} \rangle|^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \end{align*} now setting $$ \textbf{d} = -\frac{\nabla f(\textbf{x}^{(k)}) }{\|\nabla f(\textbf{x}^{(k)})\|} $$ We get \begin{align*} |\langle \nabla f(\textbf{x}^{(k)}), -\frac{\nabla f(\textbf{x}^{(k)}) }{\|\nabla f(\textbf{x}^{(k)})\|} \rangle|^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \frac{\|\nabla f(\textbf{x}^{(k)})\|^4}{\|\nabla f(\textbf{x}^{(k)})\|^2} &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \|\nabla f(\textbf{x}^{(k)})\|^2 &= \|\nabla f(\textbf{x}^{(k)})\|^2\\ \end{align*}
For your first question, take $\textbf{u}=\nabla f(\textbf{x}^{(k)})$ and $\textbf{v}=\textbf{d}.$ Then, just apply Cauchy-Schwarz and use that $\textbf{d}\in\mathbb{S}^{n-1}$ (so, it has unit norm). To get the next equality, just plug that specific $\textbf{d}$ into the left-hand side of (2). This maximizes the absolute value of the inner product, and making it negative, in turn, minimizes it (which is why we have a minus sign on $\textbf{d}$).