Descent Direction

86 Views Asked by At

I am somewhat confused, for a gradient descent line-minimizer algorithm (backtracking), for which I need to compute the descent direction $p$.

Now, I already computed the gradient $\nabla f$ for my function $f$, and was wondering in what way differs $p$ from $-\nabla f$? If they differ, how can I compute $p$ in terms of $\nabla f$?

Thank you

1

There are 1 best solutions below

0
On BEST ANSWER

The gradient $\nabla_{f(x)}$ of a real-valued function $f$ at a point $x$ contains information about more than just the direction of maximum ascent/descent, it contains all information about the first-order approximation of $f$ at $x$.

Suppose I start at $x$, and I want to figure out (according to my first-order approximation) how much I should expect $f$ to increase/decrease if I travel in direction $p$ for a distance of $\|p\|_2$. Then I simply compute,

$$p^T\nabla_{f(x)}.$$

In other words,

$$f(x+p)\approx f(x) + p^T\nabla_{f(x)}.$$

To put it concisely, we set $p=-\nabla_{f(x)}$ when we want to perform gradient descent with step-size equal to $\|\nabla_{f(x)}\|_2$,

$$f(x-\nabla_{f(x)})\approx f(x) - (\nabla_{f(x)})^T\nabla_{f(x)}=f(x) - \|\nabla_{f(x)}\|_2^2.$$

This tells us that we expect $f$ to decrease by an amount equal to the squared Euclidean norm of the gradient.

However it's not always the case that we want $p$ to be equal to the negative gradient, for instance when we have second-order information about $f$ as well.