Quality of prime seeking methods

148 Views Asked by At

I am working on prime numbers with emphasis of prime search heuristics, and found the probabilistic methods for primes seeking, I am looking for a review of those methods quality in terms of machine learning (recall, precision, accuracy, etc.)

I am looking for algorithms to find the next prime/check if a number is a prime.

obviously it will be relevant of a specific tested range but this is good for me.

1

There are 1 best solutions below

0
On

It definitely depends on the range.

Opinion: The advice for the last 25 years or so has been to use BPSW. I think Pinch (1993) made a good case for it; most math software switched to it years ago; the essential feature of it is recommended by FIPS for cryptographic testing; programming languages that didn't use it have gradually been switching as people repeatedly pointed out the downsides of badly implemented Miller-Rabin testing; and Albrecht et al (2018) gave a scathing report about some of these problems (most of which had already been repeatedly pointed out for years).

For any 64-bit input, BPSW (many variants) is fast and deterministic. It is commonly used as it is quite fast, gives completely correct results for 64-bit inputs, and has no known counterexamples for larger inputs. We assume they exist, but after 38 years with daily use in many math packages and some intensive research into finding concrete examples, we still haven't found one. Albeit the search space is ridiculously large so that isn't as impressive as it sounds, but still gives some more confidence vs. similar methods that have very little testing.

Going up to 3317044064679887385961981 (a bit larger than $2^{81}$) there is a deterministic set of Miller-Rabin bases, but that doesn't seem to be commonly used. Partly because that range is easy for methods like BLS75, APR-CL, and ECPP.

  • Fermat used by PFGW, mostly because for the range in which that targets (typically many thousands of digits) it is reasonably accurate. Not a good idea for small inputs, as M-R is just as fast and has numerous advantages.
  • Solovay-Strassen / Euler pseudoprime test not practically used, given Miller-Rabin.
  • Euler-Plumb an interesting micro-optimization for a base-2 compositeness test, stronger than the base-2 Fermat or Euler tests, but weaker than the M-R base 2 test. Still handy for a toolbox looking at fast pre-tests.
  • Perrin, Catalan, etc. curiosities. Few pseudoprimes in the small range we have examined, but they are horrendously slow and there seems to be no practical advantage.
  • Miller-Rabin / strong pseudoprime test very common, and lots of uses. It is basically as fast as a Fermat test, has fewer pseudoprimes, and avoids Carmichael numbers. It has been extensively studied.
    • Used with a number of uniformly random bases between 2 and N-2, this gives a good probabilistic test. Note that if the bases are not randomly selected then you have issues (e.g. counterexamples can be found for GMP, libtommath, and some others). This was all well pointed out in Pinch (1993), by many people online in the last decade, and most recently by Albrecht et al. (2018).
    • For 64-bit inputs, deterministic base sets have been known for some time (e.g. Pomerance et al. (1980) and Jaeschke (1993)). Efficient ones have also been found in the last decade (Best known SPRP base sets). Hashed solutions are also known, making it even more efficient (quite useful for 32-bit, but debatable vs. BPSW for larger inputs).
    • The Miller-Rabin base-2 64-bit pseudoprimes have been completely enumerated, and this list has been used to verify results in other tests such as BPSW.
  • Lucas typically only used as part of BPSW or Frobenius. This is usually the Baillie-Wagstaff (1980) version. Sometimes the "extra strong" version or a modification of that are used. The Bruckman (1994) version is not very useful at all.
  • Frobenius There are a lot of variants, and it doesn't seem that common as advantages vs. BPSW are questionable. There isn't a standardized method for parameter selection which makes it a little difficult to compare. Both Underwood and Khashin have efficient non-randomized versions. No counterexamples to either of them are known, making it difficult to really compare to BPSW. Grantham's QFT is a particular algorithm that has better proven error bounds vs. Miller-Rabin, but I don't know of any actual uses of it or its variants.
  • BPSW a combination of a strong pseudoprime test to base 2 (i.e. a single Miller-Rabin test using base 2) and a strong Lucas test. Typically the latter using the Selfridge parameter selection, but there are some advantages to using the extra strong test (Grantham 2000, OEIS A217719, Pari/GP, Perl/ntheory). Correctly implemented and using any of the reasonable choices for the Lucas test, there are no counterexamples under 64-bit (this is fairly easily tested using the Feitsma/Galway list of 64-bit SPSP-2s). The computational cost is between 2.5 and 3 Miller-Rabin tests, noting that most composites will be identified in the first part, meaning only primes and SPSP-2s will require the full cost.

Whether or not to run a proof afterwards is a different question. Forget AKS, it isn't practically useful. BLS75, APR-CL, and ECPP are the modern methods for general form inputs. There are various efficient tests for inputs of special form. Of course you don't need to do any of this for 64-bit inputs as you used BPSW or a deterministic Miller-Rabin method (you did, right?).

For next-prime, you can start with a simple increment-and-test method which is fine if you have decent pretests in your primality test. Then you can optimize with a simple wheel (e.g. mod 30). If you're using big integers you can gain a little by using an e.g. mod 23# test, which means you can skip multiples of 2/3/5/7/11/13/17/19/23 using just native modulos instead of bigint ones. If large enough then a partial sieve can help but that depends on the input size (probably not until 120+ bits), your desire for optimization, and the software you're using. All of this is just to make the pre-tests more efficient so you call your primality test (e.g. BPSW) less often.