Need some facts about Newton-Schulz iterative method and its application to sparse matrices

3.2k Views Asked by At

I am studying Newton-Schulz iterative method for obtaining an approximate inverse , which is given by

$V_{k+1}=V_{k}(2I-AV_{k})$,

wherein $I$ is the identity matrix and it converges, when the eigenvalues of $AV_0$ are less than one.

I want to know whether Newton's method work well for sparse matrices? Also, whether for the sparse matrix $A$, its inverse $V_k$ by Newton's method preserves sparsity? If not, what could be the way so that inverse matrix $V_k$ preserves sparsity for sparse $A$? Any reference related with this discussion will also be helpful to me.

Thank you very much for your help.

1

There are 1 best solutions below

3
On BEST ANSWER

It sometimes happens that someone re-invents the wheel. It's on page 1-3 of this paper:

Wrote it more than 10 years ago; cannot judge so well anymore if the text maybe helpful somehow. But anyway, here is a hopefully relevant summary of the first part of the paper.

Newton-Raphson algorithm

The Newton-Raphson method is a numerical algorithm for finding zeros $p$ of a function $f(x)$. The gist of the method is to draw successive tangent lines and determine where these lines intersect the x-axis (see figure below): $$ y - f(p_n) = f'(p_n).(x - p_n) \quad \mbox{where} \quad y = 0 \quad \mbox{and} \quad x = p_{n+1} \quad \Longrightarrow $$ $$ p_{n+1} = p_n - \frac{f(p_n)}{f'(p_n)} $$ Thereby assuming that the whole process will be convergent.
The method can be used for performing a division without actually performing a division. The equation to be searched for its roots, in this case, is given by: $$ \frac{1}{x} = a $$ Substitute $f(x) = 1/x - a$ in the above algorithm. Resulting in: $$ p_{n+1} = p_n - \frac{1/p_n-a}{-1.1/p_n^2} = p_n - (- p_n + a.p_n^2) \quad \Longrightarrow \quad p_{n+1} = 2.p_n - a.p_n^2 $$ It is well known that the Newton-Raphson method, if it converges, then it does so quadratically, meaning that the inverse $1/a$ can be found rather quickly. The algorithm has been used in the old days, on computers which had no floating point division instruction available.

enter image description here

Thus by employing the Newton-Raphson algorithm, the inverse of a number can be found with quadratic speed, by performing solely additions and multiplications. Armed with this knowledge, let's make the transition from numbers to matrices. Determining the inverse of a matrix, filled with many numbers, seems to be much more like a challenge anyway.
Let the iterative process for matrices be defined by: $$ P_0 = I \quad \mbox{and} \quad P_{n+1} = 2.P_n - A.P_n.P_n $$ Here $I$ is the unity matrix, $A$ is the matrix to be inverted and $P_n$ are the successive iterands, which should converge to the inverse matrix $A^{-1}$.
Theorem. $$ \mbox{Let} \quad M = (I - A) \quad \mbox{or} \quad A = (I - M) \quad \mbox{then:} \quad P_n = (I - M^{2^n}) . A^{-1} $$ Proof by induction: $$ \begin{array}{lll} P_0 &=& I = A . A^{-1} = (I - M) . A^{-1} \\ P_{n+1} &=& \left(I - M^{2^{n+1}} \right) . A^{-1} \\ &=& \left(I - M^{2^n.2} \right) . A^{-1} \\ &=& \left[I - (M^{2^n})^2 \right] . A^{-1} \\ &=& \left(I - M^{2^n} \right) . \left(I + M^{2^n} \right) . A^{-1} \\ &=& \mbox{because all matrices are mutually commutative:} \\ &=& \left(I - M^{2^n} \right) . A^{-1} . \left[ 2.I - A.\left( I - M^{2^n} \right).A^{-1} \right] \\ &=& P_n.(2.I - A.P_n) = 2.P_n - A.P_n.P_n \end{array} $$ Let $m = 2^n$, then: $$ P_n = (I - M^m) . (I-M)^{-1} $$ For numbers, this would be the sum of a Geometric series : $$ (I - M^m).(I - M)^{-1} = (I + M + M^2 + M^3 + M^4 + M^5 + M^6 + ... + M^{m-1}) $$ For matrices, this Geometric Series turns out to be equivalent to an iterative "incremental Jacobi" solution method, as will not be futher explained here.
But there also does exist a product of terms, called the Euler expansion (don't remember where the naming comes from; a reference would be quite welcome): $$ (I - M^m).(I - M)^{-1} = \\ (I + M^{m/2}).(I - M^{m/2}).(I - M)^{-1} = \\ (I + M^{m/2}).(I + M^{m/4}).(I - M^{m/4}).(I - M)^{-1} = \\ (I + M^{m/2}).(I + M^{m/4}). \, ... \, (I+M^2).(I+M).(I-M).(I-M)^{-1} = \\ (I + M^{m/2}).(I + M^{m/4}). \, ... \, (I + M^8).(I + M^4).(I + M^2).(I + M) $$ Let the system of equations to be solved be given by $A.w = b$. Using the above sequences, we can write: $$ P_n = (I - M^{2^n}) . A^{-1} \quad \Longrightarrow \quad A^{-1} b = (I - M^{2^n})^{-1} . P_n b $$ $$ = (I - M^{2^n})^{-1} . (I + M^{m/2}).(I + M^{m/4}). \, ... \, (I + M^8).(I + M^4).(I + M^2).(I + M) b $$ Another way of looking at this is the following: $$ (I-M)^{-1} = (I-M)^{-1}.(I+M)^{-1}.(I+M) = (I-M^2)^{-1}.(I+M) = \\ (I-M^2)^{-1}.(I+M^2)^{-1}.(I+M^2).(I+M) = (I-M^4)^{-1}.(I+M^2).(I+M) $$ With other words: $$ w = (I-M)^{-1}.b = (I-M)^{-1}.(I+M)^{-1}.(I+M).b = (I-M^2)^{-1}.(I+M).b $$ Define $b := (I + M).b$ and $M := M^2$. Then again: $$ w = (I-M)^{-1}.b = (I-M)^{-1}.(I+M)^{-1}.(I+M).b = (I-M^2)^{-1}.(I+M).b $$ And this process can be repeated until we find a way to determine $(I-M)^{-1}$, preferrably without continuing the iterations ad infinitum.

Update. Quite in general, the inverse of a sparse matrix is not sparse at all. And this, of course, will be reflected in the iterands of the Newton-Schulz method for obtaining an approximate inverse. The common remedy is simply not to calculate the inverse of a large sparse matrix and use e.g.
LU decomposition instead.
But, as argued in the abovementioned reference, the Newton-Schulz method may be regarded as an Ansatz to MultiGrid methods. And that is a quite different story, not to be elaborated here.