template<class T,class U>
T union_cast(U data){
union{U a;T b;}t{data};
return t.b;
}
float128_t quick_invsqrt_with_magic_num(int128_t mnum,float128_t X){
auto x= union_cast<float128_t>(mnum - (union_cast<int128_t>(X) >> 1));
return x;
}
I'm trying to figure out a fast invsqrt hack for float128 to complete my library, and based on the R0 value given in Chris Lomont's paper, I get that the magic number should probably be 0x2FFF6EC85E7DE30DAABC6027118577EF.
int main() {
cout << (190_128 >> 1);
cout << int128_t(r0*(1_u128 << 23)+0.5);
cout << endl;
cout << (24574_128 >> 1);
cout << int128_t(r0*(1_u128 << 112)+0.5);
cout << endl;//理论最佳值?(来自FAST INVERSE SQUARE ROOT form CHRIS LOMONT)
}
But in my test programme this theoretical magic number generates NaN 14 times in 36 test data, which is never seen in invsqrt for float64. I tried to generate the magic number with random numbers and checked its quality repeatedly, the programme worked for a whole day on my PC and the best result I got was 13 NaNs out of 36 tests.
2FFF6EC85E7DE30DAABC6027118577EF : 2.53097e+237
not good enough on 2 13 7.2e+73 1.14514e+196 980 0.142857 0.0588235 0.0138889 0.0136986 0.000499251 0.000497265 0.000495786 0.00049334 0.000502765
bad rate 14/36 (38.8889%)
Is it theoretically impossible to derive fast invsqrt magic numbers for float128 and larger IEEE floating point numbers? If so, why? If not, is there any more accurate way to get fast invsqrt magic numbers for IEEE floats with arbitrary parameters?
A link to an online run of the code I used for testing here.
The paper's method can be extrapolated to give a way to calculate the magic numbers for any floating point type, but finding $r_0$ (which doesn't depend on floating point type) is complicated, so I just used the given constant.
This snippet of Haskell code calculates the magic number for a particular floating point type given the value of $r_0$ from the paper:
Then you need to encode it into a binary floating point format, see below for inefficient code (but that is hopefully correct). Then, printing the integer in hex gives these magic numbers for half/single/double/quadruple, which last is not the same as yours (how did you calculate it?):
Appendix 1 (binary format conversion):
Appendix 2 (test code in C):
This test code (compiled with gcc version 12.2.0 (Debian 12.2.0-14) x86_64) outputs
0.0198455, which shows the algorithm is working (relative error about 2%).