I’m dealing with highly ill-conditioned normal equations of the form,
$$A^TA x = A^T b$$
Where $A\in \mathbb{R}^{n\times m}$. Usually $n >> m$.
Typical condition numbers of the matrix $A^TA$ are of the order $10^{16}$ where the condition number is defined as,
$$\text{cond}(A^TA)= \frac{|\lambda_{\text{max}}|}{|\lambda_{\text{min}}|}$$ .
Ive read that in these situations it’s important to find preconditioning matrices to make solutions to the normal equations more stable, and these can be efficiently implemented with preconditioned conjugate gradient. The preconditioned system is defined as,
$$B^{-1}A^TA = B^{—1}A^Tb$$
I’ve found matrices $B$ that I can left multiply the normal equations with which significantly reduce the condition number of the matrix $B^{-1}A^TA$ down to 1-2, however, the Eigenvalue spectrum of this auxiliary matrix has a range of $-10^{4}$ to $10^{4}$, with most values distributed around 0.
Even though this conditioning matrix produces a strict condition number of 1, how important is the distribution of Eigenvalues that the left-multiplied matrix has, and does a large (and symmetric about 0) eigenvalue distribution effect the convergence behavior of conjugate gradient type minimizers?