In the gradient descent algorithm $A$ is symmetric and we want to minimaze the function $\phi(x)={1\over2}x^TAx-x^Tb$ to get a solution of $Ax=b$
$x_{k+1}=x_k+\alpha_k p_k$ where $p_k=-\nabla\phi(x_k)=-Ax_k+b=r_k$
And $\alpha_k$ has to bo chosen such that $\phi(x_k+\alpha_k p_k)$ is minimal for a moving $\alpha_k$.
We get $\alpha_k=\frac{\langle r_k,r_k\rangle}{\langle r_k,Ar_k\rangle}$
And then we define $r_{k+1}=r_k-\alpha_kAr_k$. Why did we ditch $p_k$ and are using newly defined $r_k$ for the direction of the descent?