In the gradient descent algorithm why do we use $r_k$ and $p_k$ for the same thing, namely the direction of the greatest descent?

28 Views Asked by At

In the gradient descent algorithm $A$ is symmetric and we want to minimaze the function $\phi(x)={1\over2}x^TAx-x^Tb$ to get a solution of $Ax=b$

$x_{k+1}=x_k+\alpha_k p_k$ where $p_k=-\nabla\phi(x_k)=-Ax_k+b=r_k$

And $\alpha_k$ has to bo chosen such that $\phi(x_k+\alpha_k p_k)$ is minimal for a moving $\alpha_k$.

We get $\alpha_k=\frac{\langle r_k,r_k\rangle}{\langle r_k,Ar_k\rangle}$

And then we define $r_{k+1}=r_k-\alpha_kAr_k$. Why did we ditch $p_k$ and are using newly defined $r_k$ for the direction of the descent?