Time complexity of conjugate gradients applied to normal equations

326 Views Asked by At

Consider a least squares problem $\min_x \|Ax-b\|^2$ with a rectangular matrix $A$. One way to solve it is to use the conjugate gradients method applied to the normal equations $A^\top A x = A^\top b$, as implemented by the function lsqr in MATLAB. My question is: what is the time complexity required to solve the problem, say up to a tolerance $\epsilon$?

I know that if $A$ is a positive semidefinite matrix and the CG method is applied directly to $Ax=b$, the time complexity is $\mathcal O(m\sqrt\kappa)$ where $m$ is the number of nonzero entries in $A$ and $\kappa$ is its condition number. But since in our case we apply the method to the normal equations, we should consider $A^\top A$ rather than $A$. The corresponding condition number is now $\kappa^2$, but much worse than that: $A^\top A$ might not be sparse even if $A$ was (the sparsity bound $m^2$ might be very bad). In other words, the resulting complexity, $\mathcal O(\tilde m\kappa)$ where $\tilde m = \min\{m^2, \text{ size of }A\}$, is unsatisfactory.

However, in chapter 13 of this paper, the author mentions that $A^\top A$ is actually never computed, and there are smart ways to exploit the sparsity of $A$. In fact, if I understand correctly, it seems that the new complexity is just $\mathcal O(m\kappa)$, as we only calculate products of the form $A^\top Ad$ for some vector $d$ by first calculating $Ad$ and then $A^\top (Ad)$. Is this correct? If not, are there any guarantees better than the trivial one described above? Does someone know of a relevant reference for these questions?