Deriving CG algorithm from Krylov point of view

122 Views Asked by At

I'm working on Gilbert Strang's course on CG algorithm to solve $Ax = b, A \in \mathcal{M}_{n \times n}(\mathbb{R}), \quad x,b \in \mathbb{R}^n$ : https://math.mit.edu/classes/18.086/2006/am64.pdf It follows exactly the videos from his MIT courses.

In this extract from a book I'd be glad to know the title of, he goes 4 steps : (first we do not suppose $A$ to be symmetric nor positive definite)

  1. Some reminders on Krylov subspaces, $\mathcal{K}_j = \text{span}(b, Ab, ... A^{j-1}b)$, discussion on the impact of basis' condition number
  2. Orthogonalizing the Krylov basis $(b, Ab, ... A^{j-1}b)$ into $(q_1, q_2, ... q_j)$ to get optimal condition number $1$. Discussion on the error propagation of Gram-Schmidt method in floating-point arithmetic, and the alternative Arnoldi's method, that factors not $A$ but $AQ$ into $QH$, where $H$ is upper Hessenberg, giving us recurrence relations for the $q$'s.
  3. Noticing that, if the matrix $A$ is symmetric, then $H$ is tridiagonal. The Arnoldi's iteration becomes the Lanczos iteration. It is a short recurrence. Discussion on the symmetric eigenvalue problem, how it is related to Lanczos iteration.
  4. Stating the rule for CG from Krylov Subspaces point of vew : the best $x_k$ in $\mathcal{K}_k$ is such that the residual $r_k = B - A x_k$ (this vector is in $\mathcal{K}_{k+1}$) is orthogonal to $\mathcal{K}_k$. Deducing that the residual are orthogonals to one another, and that the updates $x_k$ - $x_{k-1}$ are orthogonal to one another, they are "conjugate directions".

Then Strang derives the CG algorithm with the alpha's, the beta's, the d's immediatly... That's where I become confused, because I really don't see what are the alpha's, the beta's and the q's from the Krylov point of view. I understand what they are from the optimization point of view. But I'm frustrated : I really want to derive the algorithm from those, and not from real analysis.

I'm not confused by the formulas of the alpha's and the beta's : knowing (4) conclusions, I manage to prove them. But I'd really want to know what they represent from Krylov subspaces' point of view.

Here's my argument so far :

  1. We've arrived to the point where we have chosen $x_{k-1}$ in $\mathcal{K}_{k-1}$ such that $r_{k-1} \in \mathcal{K}_{k}$ is orthogonal to $\mathcal{K}_{k-1}$. Thanks to the Lanczos iteration, we calculate the vector $q_k$ and we now know $\mathcal{K}_{k}$ (I'm already confused here : shouldn't we know $\mathcal{K}_k$ to determine $r_{k-1}$ ? But the Krylov method is choosing the best $x_{k-1}$ in $\mathcal{K}_{k-1}$ before we expand to $\mathcal{K}_{k}$, no ?)
  2. We want to choose $x_k$ such that $r_k$ is orthogonal to $\mathcal{K}_k$. I'm want to express $x_k$ as a linear combination of $q$'s, not of $A^j b$'s (if I understood correctly, this is the point of the discussion on condition number). So I want $x_k = \lambda_1 q_1 + ... + \lambda_k q_k$ such that $r_k = b - \lambda_1 A q_1 - ... - \lambda_k A q_k$ is othogonal to $\mathcal{K}_k$.
  3. My first reflex is substract the projections $\langle r_k, q_i \rangle q_i$ from the vector $r_k, \quad i \leq k$
  4. I'm stuck here, because those projection don't give me good formulas :

$\langle b - \lambda_1 A q_1 - ... - \lambda_k A q_k, q_i \rangle q_i = 0$

i.e $\langle b, q_i\rangle q_i = \langle \lambda_1 A q_1 + .. + \lambda_k A q_k, q_i \rangle q_i$

i.e, from the Lanczos iteration,

$\langle b, q_i\rangle q_i = (\lambda_i h_{i,i} + \lambda_{i-1} h_{i,i-1} + \lambda_{i+1} h_{i, i+1}) q_i$

Can you please help me? Thanks in advance and good day

PS : Sorry if my English doesnt express enough courtesy, I tried my best but it's not my mother language... thanks again$