I'm learning about nonlinear, unconstrained optimization.
In my book it says that a descent direction $p_k$ must satisfy: $$p_k\nabla f(x_k)^T < 0$$ This seems to mean that $p_k$ must be obtuse to the gradient. Why is that?
I read in wikipedia that "The motivation for such an approach is that small steps $p_k$ along guarantee that $f$ is reduced, by Taylor's theorem." But I still can't see why. Is there any easy answer or intutive way to see why this is a condition or is this something that you have to do a long derivation to understand?