Is there some intuition as to why ill conditioned system of equations hard to solve iteratively ( i.e. the convergence is slow) ? I've read convergence proofs of several methods, but still don't have any real intuition.
2026-04-07 19:43:27.1775591007
Why are ill-conditioned systems of equations hard to solve iteratively?
337 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in INTUITION
- How to see line bundle on $\mathbb P^1$ intuitively?
- Intuition for $\int_Cz^ndz$ for $n=-1, n\neq -1$
- Intuition on Axiom of Completeness (Lower Bounds)
- What is the point of the maximum likelihood estimator?
- Why are functions of compact support so important?
- What is it, intuitively, that makes a structure "topological"?
- geometric view of similar vs congruent matrices
- Weighted average intuition
- a long but quite interesting adding and deleting balls problem
- What does it mean, intuitively, to have a differential form on a Manifold (example inside)
Related Questions in NUMERICAL-LINEAR-ALGEBRA
- sources about SVD complexity
- Showing that the Jacobi method doesn't converge with $A=\begin{bmatrix}2 & \pm2\sqrt2 & 0 \\ \pm2\sqrt2&8&\pm2\sqrt2 \\ 0&\pm2\sqrt2&2 \end{bmatrix}$
- Finding $Ax=b$ iteratively using residuum vectors
- Pack two fractional values into a single integer while preserving a total order
- Use Gershgorin's theorem to show that a matrix is nonsingular
- Rate of convergence of Newton's method near a double root.
- Linear Algebra - Linear Combinations Question
- Proof of an error estimation/inequality for a linear $Ax=b$.
- How to find a set of $2k-1$ vectors such that each element of set is an element of $\mathcal{R}$ and any $k$ elements of set are linearly independent?
- Understanding iterative methods for solving $Ax=b$ and why they are iterative
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Assume for simplicity that you want to solve a system $Ax=b$ where $A$ is Hermitian and positive definite (HDP) and hence there is a unitary $U$ and real diagonal $D$ (with the positive eigenvalues $\lambda_1\geq\ldots\geq\lambda_n>0$ on the diagonal) such that $A=VDV^*$.
Many methods including the Richardson's method, the conjugate gradient method, GMRES, etc. start from a given initial guess $x_0$ and seek for the approximation $x_k$ at step $k$ in the shifted Krylov subspace $$x_0+\mathcal{K}_k(A,b):=x_0+\mathrm{span}\{r_0,Ar_0,\ldots,A^{k-1}r_0\},$$ where $r_0:=b-Ax_0$ is the residual vector of $x_0$. Consequently, $x_k$ can be written in the form $$ x_k=x_0+q(A)r_0, $$ where $q$ is a polynomial of the degree (at most) $k-1$. Let's have a look at the error $e_k:=x-x_k$. We have $$\tag{1} e_k=x-x_k=x-x_0-q(A)r_0=e_0-q(A)Ae_0=[I-q(A)A]e_0=p(A)e_0, $$ where the polynomial $p$ of the degree (at most) $k$ satisfies $p(0)=1$. This characterizes all the so-called polynomial methods. The error at step $k$ is given by a polynomial in $A$ of the degree at most $k$ with normalized at origin (having the unit constant coefficient). For example, in the Richardson's method $x_k=x_{k-1}+\omega(b-Ax_{k-1})$, the polynomial $p$ is given by $p(t)=(1-\omega t)^k$.
The convergence analysis of polynomial methods is usually based on analysing the worst case convergence rate which is obtained by eliminating the effects of the right-hand side (or, equivalently, the initial error). Say, we are interested in the energy norm of the error $\|e_k\|_A:=(e_k^*Ae_k)^{1/2}$. We have $$ \|e_k\|_A=\|p(A)e_0\|_A=\|A^{1/2}p(A)e_0\|_2=\|p(A)A^{1/2}e_0\|_2\leq\|p(A)\|_2\|A^{1/2}e_0\|_2=\|p(A)\|_2\|e_0\|_A. $$ So at step $k$, the relative error can be bounded from above by $$\tag{2} \frac{\|e_k\|_A}{\|e_0\|_A}\leq \|p(A)\|_2=\|p(D)\|_2=\max_{1\leq i\leq n}|p(\lambda_i)|. $$
Now comes the issue with the ill-conditioned problems:
This is certainly not the case for the Richardson method. If the parameter $\omega$ is chosen optimally, that is, $\omega=2/(\lambda_1+\lambda_n)$, the polynomial $p$ is given by $$ p(t)=\left(1-\frac{2t}{\lambda_1+\lambda_n}\right)^k. $$ You can easily verify, that $p$ is small in the middle part of the spectrum but fairly large on its boundaries, in fact, $$\tag{3} p(\lambda_1)=p(\lambda_n)=\left(\frac{\lambda_1-\lambda_n}{\lambda_1+\lambda_n}\right)^k =\left(\frac{\kappa-1}{\kappa+1}\right)^k, $$ where $\kappa:=\lambda_1/\lambda_n$ is the spectral condition number of $A$. That is, the Richardson method eliminates quickly the components of $e_0$ lying in the middle of the spectrum but slowly the components combined from the eigenvectors corresponding to small and large eigenvalues (see the attached figure, where $\lambda_1=10$, $\lambda_n=0.1$). The bound on the convergence rate of the Richardson's method following from (2) and (3) is of course the worst-case bound. If by any luck your initial error were combined from the eigenvectors lying in the middle part of the spectrum, the real convergence rate would be much better.
The Richardson's method is not very optimal one. The best polynomial method for HPD problems which minimizes the $A$-norm of the error at each step is the conjugate gradient method (CG). If we denote by $\Pi_k$ the set of $k$ degree polynomials normalized at origin, the error $e_k$ of CG satisfies $$ \|e_k\|_A=\min_{p\in\Pi_k}\|p(A)e_0\|_A. $$ Again, we can obtain the worst-case convergence bound $$ \frac{\|e_k\|_A}{\|e_0\|_A}\leq\min_{p\in\Pi_k}\|p(A)\|_2=\min_{p\in\Pi_k}\max_{1\leq i\leq n}|p(\lambda_i)|. $$ Consequently, CG faces the real polynomial interpolation problem: find $p\in\Pi_k$ which attains the smallest possible values at all the eigenvalues of $A$.
This does not necessarily must be (at least theoretically) be a problem, e.g., when the spectrum is somewhat clustered. Imagine an "ideal" case, where $A$ has only few (say, $m\ll n$) distinct eigenvalues so that the minimum polynomial of $A$ has degree $m$. Since we can normalize it so that it attains the value $1$ at origin, we have then that CG converges in at most $m$ steps as $p_m(A)=0$. A useful, but sometimes misleading, idea is hence that if the spectrum of $A$ consists of a small number of tight clusters of eigenvalues, then CG converges fast. The worst convergence, on the other hand, can be expected when the spectrum is equidistant.
I hope this somewhat helps with what you were asking :-) I would suggest you to have a look on this book for some interesting overview on the topic.