I have been searching for an intuitive explanation of the conjugate gradient method (as it relates to gradient descent) for at least two years without luck.
I even find articles like "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain" hard to understand.
Intuitively, what does this method do (e.g. geometrically) and why does it outperform gradient descent?
Check the full version of
Shewchuk (1994) An Introdution to the Conjugate Gradient Method without Pain
This pdf is a 64 page document with 40+ figures (full of geometric insights). The version you got is just a 17 page version of the full document, without figures.