Why does MINRES converge in 3 iterations on matrices of specific form?

198 Views Asked by At

The MINRES algorithm for solving $Ax = b$ for symmetric $A$ can be described as follows: The $k$-th iterate of the algorithm is $$x_k = \arg\min_{K_k(A)} \lVert Ax-b \rVert_2$$ where $K_k(A)=\text{span}\{A^ib\mid i < k\}$ is the $k$-th Krylov subspace of $A$

By this definition, it is clear that it converges to the exact solution in $n$ iterations, if $A$ is an $n\times n$ matrix. By using this solver on matrices with a very specific structure, I noticed that in that case, the solver converges in just 3 iterations. The matrices are of the form $$ A = \begin{bmatrix} I_{n-1} & v\\ v^T & 0 \end{bmatrix} $$ where $v$ is a column vector and and $I_{n-1}$ is the $(n-1) \times (n-1)$ identity matrix. How can this early convergence be explained?

My thoughts

The matrix $A$ can in this case be seen as an Identity matrix plus 2 rank-one corrections: $$A = I_n + \begin{bmatrix} 0 & v\\ 0 & -\frac{1}{2} \end{bmatrix} + \begin{bmatrix} 0 & 0\\ v^T & -\frac{1}{2} \end{bmatrix}$$

This means that we can exactly invert the matrix using the Sherman-Morrison formula twice. I currently avoid doing this explicitly as it leads to instabilities, while 3 MINRES iterations do not. Maybe MINRES algorithm implicitly exploits the fact that we are just a rank-two matrix away from the identity?

You can verify this behaviour with this example python snipet.

1

There are 1 best solutions below

0
On BEST ANSWER

The reason is that there is a large Eigenspace corresponding to the eigenvalue $1$. If you solve the eigenvalue problem

$$ \begin{bmatrix} I_{n-1} & v \\ v^T & 0\end{bmatrix} \cdot \begin{bmatrix} x \\ \alpha \end{bmatrix} = \lambda \begin{bmatrix} x \\ \alpha \end{bmatrix} $$

you'll find that there are $n-2$ eigenvectors with $\alpha=0$ and $x\perp v$ corresponding to the eigenvalue $1$ and 2 eigenvectors with $\alpha\neq 0$ and $x\| v$. More specifically in the latter case we have the eigenvectors $x_\pm=v$ and $\alpha_\pm = \frac{1}{2}(-1\pm\sqrt{1+4 v^T v})$ corresponding to the eigenvalue $\lambda_\pm = \frac{-1\pm\sqrt{1+4v^T v}}{2v^T v}$. Note that since $A$ is symmetric, those $EV%$ are orthogonal.

Consequently, given a target vector $y=\begin{bmatrix}b\mid\beta\end{bmatrix}$ we can split it up as a linear combination

$$ y = \underbrace{\mu_1\begin{bmatrix}v\mid\alpha_+\end{bmatrix}}_{=: z_1} + \underbrace{\mu_2\begin{bmatrix}v\mid \alpha_-\end{bmatrix}}_{=: z_2} + \underbrace{\mu_3\begin{bmatrix}v^\perp\mid 0\end{bmatrix}}_{=: r} $$

Consequently, $ A y = \lambda_1 z_1 + \lambda_2 z_2 + r $, $A^2y =\lambda_1^2 z_1 + \lambda_2^2 z_2 + r$, etc. Now it should become clear why we only need 3 iterations: We only need to figure out 3 coefficients: the coefficient to $z_1$, the coefficient to $z_2$ and the coefficient to $r$.

Lemma: assume $A$ is orthogonally similar to the $n\times n$ block matrix $A' = \begin{bmatrix}I_r & 0 \\0 & D\end{bmatrix}$, where $D$ is a diagonal matrix with $D_{ii}\neq D_{jj}\neq 1$ for all $i\neq j$. Then minres terminates in exactly $n-r+1$ iterations.

Proof: Given orthogonal $U$, which diagonalized $A$, we have $$ \|Ax - y\|_2 =\| A U^T Ux - y\|_2 = \|UA U^T U x - U y\|_2 = \|A' x' - y'\|_2 $$ So both optimization problems are equivalent,in the sense that if $x_k = \arg\min_{\xi\in K_k(A)}\|A\xi-y\|$ then $x_k =U^Tx_k'$ where $x_k' = \arg\min_{\xi\in K_k(A')}\|A'\xi-y'\|$. And the early termination in the diagonal case follows by induction over the size of $D$.

Corollary: If $A$ is orthogonally diagonalizable and has exactly $r$ disjoint eigenvalues then minres terminates in exactly $r$ iterations. (in practice this is not guaranteed due to numerical error)

import numpy as np
from scipy.sparse.linalg import minres
from scipy.stats import ortho_group

# create a random matrix of the specific form
N = 100
v = np.random.randn(N-1,1)
b = np.random.random(N)
A = np.block([[np.eye(N-1), v], [v.T, 0]])

# run MINRES for 3 iterations
callback = lambda x: print(np.linalg.norm(A.dot(x)-b))
x, info = minres(A, b, tol=1e-15, callback=callback)
print("MINRES returned:", info)
print("The returnvalue 0 means that it has converged.")

# orthogonal similarity transform
U = ortho_group.rvs(N)
A_dash  = U.T @ A @ U
b_dash  = U @ b
callback = lambda x: print(np.linalg.norm(A_dash.dot(x)-b_dash))
x, info = minres(A_dash, b_dash, tol=1e-15, callback=callback)
print("MINRES returned:", info)
print("The returnvalue 0 means that it has converged.")

# 4 disjoint EVs
U = ortho_group.rvs(N)
A  = np.diag(np.concatenate([2*np.ones(N//4), 3*np.ones(N//4), -1*np.ones(N//4), 10*np.ones(N//4)]))
A_dash = U.T @ A @ U
b_dash  = U @ b
callback = lambda x: print(np.linalg.norm(A_dash.dot(x)-b_dash))
x, info = minres(A_dash, b_dash, tol=1e-15, callback=callback)
print("MINRES returned:", info)
print("The returnvalue 0 means that it has converged.")