Confused on Properties of Residuals in the Conjugate Gradients Method

223 Views Asked by At

I'm reading through this explanatory paper on the conjugate gradient method.

I've been understanding the 'Conjugate Directions' (pdf page 27-36) algorithm (which is like conjugate gradients, but instead of using Gram-Shmidt conjugation on the residuals, you do it on some random independent vector set).

Now, we are using the residuals at each step as the directions. There are a few things i don't understand that the author of the paper seems to be talking about as "benefits" of using the residuals.

He keeps mentioning that the residuals are "orthogonal to all residuals before it". Is there an intuitive way to think about this? My question is if this is true, why now? In the Steepest Descent algorithm, you can take multiple steps in the direction of the residual, but they often were in similar directions. Why, in this case, do they never overlap?

My biggest confusion is regarding his conclusion on why using the residuals makes computing Gram-Shmidt Conjugation easier. I don't understand the first paragraph on PDF page 37 (not actual page 37 of paper). Why does the fact that that the directions space is a Krylov space tell us that the residual i is A-orthogonal to the directions space i-1?

enter image description here

Also, I can't convince myself the second sentence in that paragraph is right. "the fact that the next residual r_(i + 1) is orthogonal to space D_(i + 1) from Equation 39" seems wrong. Doesn't Equation 39 tell us that r_(i + 1) is orthogonal to space D_(i)?

Thanks, A