This is an explanation of why $-\nabla f_k$ is the better direction to step down. I undrstand why $-\nabla f_k$ is the direction that minimizes $p^t\nabla f_k$, why minimizing $p^T\nabla f_k$ will give the most rapid decrease and why the taylor theorem explains this.
I thought maybe the second order term in the taylor is necessairly positive. Is this a reason?
UPDATE:
I think I need to be more clear: if we take a $p$ such that $p^T\nabla f_k$ is minimized, how can we be sure that $f(x_k+\alpha p)$ gets minimized? What if the second order term $\frac{1}{2}\alpha^2 p^T\nabla^2f(x_k+tp)p$ grows when we go to tis direction?
