Division-free Krylov methods?

103 Views Asked by At

I have been using various Krylov subspace methods for a long time, especially conjugate gradient. I'm wondering if there is some way to modify the algorithms to postpone / avoid the divisions that occur so that one could build the whole algorithm using only multiplications and additions. Any reference or description of how to do it would be welcome.

1

There are 1 best solutions below

0
On BEST ANSWER

As noted by Wikipedia the primary algorithms for fast floating-point division algorithms based on existing floating-point multiply and add capabilities are Newton-Raphson iteration and the related Goldschmidt algorithm. The latter usually provides more parallelism, but is not self-correcting like NR-iterations , making error analysis more difficult. It is relatively easy to build implementations that deliver approximate results, but designing implementations that provide proper rounding as defined by the floating-point standard IEEE-754 is a task for experts.

Various hardware platforms have no native floating-point division capability and use sequences of multiplies and adds, or – more commonly these days – sequences of fused multiply-add operations (FMA). The AMD Athlon processor used Goldschmidt's algorithm, the Intel Itanium processor used NR iterations, and NVIDIA GPUs using CUDA use Halley iteration on the reciprocal (related to NR-iterations but with cubic convergence). This latter work is not documented in a paper, but one can easily examine the relevant code through disassembly of a CUDA binary containing floating-point division with cuobjdump --dump-sass.

Stuart F. Oberman, "Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7TM Microprocessor". In Proceedings 14th IEEE Symposium on Computer Arithmetic, IEEE 1999, pp. 106-115 (online)

Peter Markstein, "Software division and square root using Goldschmidt’s algorithms." Proceedings of the 6th Conference on Real Numbers and Computers (RNC’6), Vol. 123, Nov. 2004, pp. 146-157 (online)

Peter Markstein, IA-64 and Elementary Functions. Speed and Precision, Prentice Hall 2000 (see chapter 8)

Marius Cornea, John Harrison, and Ping Tak Peter Tang, Scientific Computing on Itanium®-based Systems. Intel Press 2002 (see chapter 8)

Marius Cornea, "Software implementations of division and square root operations for Intel® Itanium® processors." In Proceedings of the 2004 workshop on Computer architecture education, ACM 2004. Article No. 17 (online)


Integer division can be implemented in much the same way, see this question on Stackoverflow for a worked example of 64-bit integer division via Halley iteration.