Derivation of the steepest direction

356 Views Asked by At

The basic idea behind the derivation of the steepest direction is to find $\textbf{d}$ such that $f(\textbf{x}^{(k)} + \lambda \textbf{d})$ decreases the most for small values of $\lambda$

In other words, finding $\textbf{d} \in \mathbb{S}^{n-1}$ that minimizes $$ \frac{d}{d\lambda}f(\textbf{x}^{(k)} + \lambda \textbf{d})|_{\lambda = 0} = \langle \nabla f(\textbf{x}^{(k)}, \textbf{d})\rangle \qquad (1) $$ Now using the Cauchy-Schwarz Inequality, we get $$ |\langle\nabla f(\textbf{x}^{(k)},\textbf{d} \rangle|^2 \leq \|\nabla f(\textbf{x}^{(k)})\|^2 \qquad (2) $$ with equality in case $\textbf{d}$ lies on the line spanned by $\nabla f(\textbf{x})^{(k)}$, so that $$ \textbf{d} = -\frac{1}{\|\nabla f(\textbf{x}^{(k)})\|}\nabla f(\textbf{x}^{(k)}) \qquad (3) $$ delivers the steepest direction

Here are my questions:

  • The Cauchy-Schwarz Inequality states that $$ \boxed{| \textbf{u}^\top \textbf{v} | \leq \| \textbf{u}\| \|\textbf{v}\|} $$ How can I get (2) with it?

  • Now tackling the way we get to (3) we can start from (2) \begin{align*} |\langle\nabla f(\textbf{x}^{(k)},\textbf{d} \rangle|^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2 \\ ((\nabla f(\textbf{x}^{(k)})^\top\textbf{d})^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \end{align*}

But then again I'm stuck

EDIT So by setting $\textbf{u} = \nabla f(x^{(k)}$ and $\textbf{v} = d$ with $\|\textbf{d}\| = 1$ as it is $\in \mathbb{S}^{n-1}$, we can plug those 2 values into the CS inequality which gives

\begin{align*} \langle \nabla f(\textbf{x}^{(k)}), \textbf{d} \rangle &\leq \|\nabla f(\textbf{x}^{(k)})\| \|\textbf{d}\|\\ \langle \nabla f(\textbf{x}^{(k)}), \textbf{d} \rangle &\leq \|\nabla f(\textbf{x}^{(k)})\| (1)\\ |\langle \nabla f(\textbf{x}^{(k)}), \textbf{d} \rangle|^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \end{align*} now setting $$ \textbf{d} = -\frac{\nabla f(\textbf{x}^{(k)}) }{\|\nabla f(\textbf{x}^{(k)})\|} $$ We get \begin{align*} |\langle \nabla f(\textbf{x}^{(k)}), -\frac{\nabla f(\textbf{x}^{(k)}) }{\|\nabla f(\textbf{x}^{(k)})\|} \rangle|^2 &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \frac{\|\nabla f(\textbf{x}^{(k)})\|^4}{\|\nabla f(\textbf{x}^{(k)})\|^2} &\leq \|\nabla f(\textbf{x}^{(k)})\|^2\\ \|\nabla f(\textbf{x}^{(k)})\|^2 &= \|\nabla f(\textbf{x}^{(k)})\|^2\\ \end{align*}

2

There are 2 best solutions below

4
On BEST ANSWER

For your first question, take $\textbf{u}=\nabla f(\textbf{x}^{(k)})$ and $\textbf{v}=\textbf{d}.$ Then, just apply Cauchy-Schwarz and use that $\textbf{d}\in\mathbb{S}^{n-1}$ (so, it has unit norm). To get the next equality, just plug that specific $\textbf{d}$ into the left-hand side of (2). This maximizes the absolute value of the inner product, and making it negative, in turn, minimizes it (which is why we have a minus sign on $\textbf{d}$).

2
On

Teh derivation of (2) starts with the assumption (which the steps you presented did not state explicitly) that $\mathbf{d} = 1$. With this added information, let $ u = \nabla f(\mathbf{x}^{(k)})$ and $v = \mathbf{d}$. Then CS says $$ |u \cdot v|^2 \leq |v!^2 = |u|^2 (1) = ||\nabla f(\mathbf{x}^{(k)})||^2 $$

From (2) you know that whatever direction is chosen for $\mathbf{d}$, the LHS is no greater than $||\nabla f(\mathbf{x}^{(k)})|||^2$. Therefore if you can find a direction (a value of $\mathbf{d}$) such that equality holds, this is a direction (perhaps there could be others) maximizing the LHS.

$\nabla\mathbf{x}^{(k)}$ is one such value of $\mathbf{d}$ that give equality in (2).

Since the LHS gives the absolute value of the slope of the function along $\mathbf{d}$, to minimize the slope you need to choose $\mathbf{d}$ proportional to $\nabla\mathbf{x}^{(k)}$ but such that

  • The magnitude of $\mathbf{d}$ is $1$, and
  • $\nabla f(\mathbf{x}^{(k)})$ is negative.

Expression (3) is the unique expression having those properties.