The MINRES algorithm attempts to solve $A x = b$, where $A$ is symmetric/Hermitian. I am trying to derive the method on my own, but I keep getting nowhere.
Everytime MINRES is mentioned in literature, the same reference is brought up: C. C. Paige and M. A. Saunders, "Solution of Sparse Indefinite Systems of Linear Equations".
This paper, however, is of no help to underestand the method, as the authors use confusing notation, go off on tangents, and never peresent the algorithm in a short and concise manner.
My progress so far is:
- Take $A$ and $b$ and compute matrices $V$ (orthogonal) and $T$ (tridiagonal) using the Lanczos algorithm, such that $$ V^\intercal A V = T \quad \implies \quad A V = V T$$
- Write the solution as a linear combination of the columns of $V$: $$x = Vy$$
- Rewrite the system as $$ Ax = AVy = VTy = b \quad \implies \quad T y = V^\intercal b = \lVert b \rVert e_1$$
Now I'm lost, because the paper suggest to find some auxiliary vector $t$, as well as the inverse of some Cholesky decomposition matrix $L$ and write $x$ in terms of that instead, leading to: $ x = V L^{-1} t $, because somehow $y = L^{-1} t $.
To increase my frustration, when looking on Wikipedia, a completely different (and seemingly much simpler) algorithm is shown that does not require the Lanczos step at all! And this is presented without explanation or references!
Questions:
- How to find $t$ and $L$?
- Do we really need to compute $L^{-1}$? Why not invert $T$ to find $y$?
- How to go from matrix sintax to a recursive formulation that requires minimal memory?
How is the wikipedia algorithm related to the original article?
Thank you very much!
EDIT: I learned now, that the algorithm shown in the wikipedia page is in fact the Conjugate Residual (CR) method, and not the MINRES method.
I think I got it now.
As a direct method
Take $Ax = b$.
Employ Lanczos algorithm to find $V$ and $T$ such that $$V^\intercal A V = T \quad \implies \quad A V = V T \,,$$ where $V$ is an orthogonal matrix whose first column is $v_1 = \frac{b}{\lVert b \rVert}$, and $T$ is tridiagonal ($T$ is both upper and lower Hessenberg due to A being symmetric).
Express the solution in therms of a linear combination of the columns of $V$ $$x^* = Vy$$
Rewrite the system: $$Ax^* = A V y = V T y = b \quad \implies \quad T y = V^\intercal b = \lVert b \rVert e_1$$
Perform a series of Givens rotations on $T$ to obtain its QR decomposition (R is upper triangular): $$T = QU$$
Rewrite again: $$T y = Q U y = \lVert b \rVert e_1 \quad \implies \quad Uy = \lVert b \rVert Q^\intercal e_1 $$
Denote $ \alpha = Uy = \lVert b \rVert Q^\intercal e_1 \quad \implies y = U^{-1} \alpha$
Leading to $x^* = V y = V U^{-1} \alpha$.
Denote $P = V U^{-1}$, yielding
$$ x^* = P \alpha = \lVert b \rVert V U^{-1} Q^\intercal e_1$$
Critical Remarks
Note that $U$ is upper triangular but only has 3 (upper) diagonals. Thus, the information needed to generate a new column of $P$ is contained within the respective column of $V$ and two previous columns of $P$. This becomes apparent, if we write $PU = V$. Therefore, rather than computing $U^{-1}$ and then $V U^{-1}$, we can instead generate the columns $P$ sequentially by a short (3 terms) recurrence relationship.
Note that, while the vector $y$ cannot be found until the whole procedure is completed (all entries depend on the very last one), the vector $\alpha$ can be found sequentially, with the guarantee that previously determined elements are constant (columns of $Q$ are constant).
Finally, note that $x^* = Vy = P \alpha$, so rather than taking orthogonal steps along the columns of $V$, we can take oblique (but still linearly independent) steps given by the columns of $P$. Steps taken along the columns of $P$ do not need to be revisited (coefficients $\alpha$ do not change).