Pseudoprime test

1.4k Views Asked by At

Background

An Euler pseudoprime $n$ to a coprime base $a$ is a composite, odd $n$ which satisfies $$a^{\frac{n-1}{2}} \equiv \pm 1\ (\text{mod } n).$$ Wikipedia link.

This is always the case for primes, so it is remarkable when composites slip through. I'll reproduce the argument here, from Fermat's Little Thm:

  1. (FLT) if $p=2q+1$ is prime, and coprime to $a$, then $a^{p-1} \equiv 1\ (\text{mod }p)$. Therefore,
  2. $a^{(2q+1) -1} \equiv 1\ (\text{mod }p)$
  3. $a^{2q} -1 \equiv 0\ (\text{mod }p)$. Factoring gives
  4. $(a^q+1)(a^q-1)\equiv 0\ (\text{mod }p)$. Conclusion:
  5. $a^{q} \equiv \pm 1\ (\text{mod } n)$

I spent some time working on/around with this, and happened to find a fast primality check, which fails 12 times under 100 million.

The Algorithm

The algorithm runs as follows, given odd $n$:

  1. For the first $\lfloor\text{log}_2(n)\rfloor$ primes $p$, determine if $$p^{\frac{n-1}{2}} \equiv \pm 1\ (\text{mod}\ n)$$
  2. If this holds for all such $p$, output PROBABLY PRIME.
  3. Else, output COMPOSITE

...and that's it.

Question

Why does this work so well? I threw out a lot in stripping it down, most noatably, the requirement that $(p,n)=1$. Somehow it fails less, and requires around $O(\text{log}^3 n)$ checks, rather than $O(\text{log}^6 n)$ (from the most recent unconditional AKS bound). Can I expect it to fail more or less often as I extend the range?

I am actively researching, and wanted to know if anyone has seen or heard of something similar or good references to read.

The failure cases ($n \lt 10^8$), for the curious, are
1. 3,828,001 = 101 *151 *251
2. 17,098,369 = 113 *337 *449
3. 17,236,801 = 151 *211 *541
4. 23,382,529 = 97 *193 *1249
5. 37,964,809 = 109 *379 *919
6. 56,052,361 = 211 *421 *631
7. 62,756,641 = 109 *241 *2389
8. 68,154,001 = 151 *601 *751
9. 79,411,201 = 193 *257 *1601
10. 84,350,561 = 107 *743 *1061
11. 90,698,401 = 103 *647 *1361
12. 92,625,121 = 181 *631 *811

2

There are 2 best solutions below

0
On BEST ANSWER

One can go slightly further, as Euler did, and note that it isn't just $\pm 1$ but $(a|p)$ (the Jacobi symbol). Using this with various bases to make a probable prime test is the Solovay-Strassen test. As Peter points out, at most 1/2 of the bases can be liars for a composite, leading to the idea that using random bases we have a $2^{-k}$ probability of a composite passing $k$ tests. Using fixed bases (e.g. the first $k$ primes) allows adversaries to fool you every time unless you can show you use enough bases that it is not possible for any composite to pass (see Miller's test below).

This has been supplanted by the Miller-Rabin test, which takes essentially identical time and allows only 1/4 of the bases, so is in some sense twice as efficient. It is quite commonly used.

There are deterministic sets of bases that can be used if your input is limited in size, and some clever hashing methods to further reduce the number of bases required for a deterministic and correct answer for a given input.

Probably the most popular method for serious contemporary use is the BPSW test, which runs in $O(\log^2 n)$ time, has been shown to have no counterexamples under $2^{64}$, and no known larger counterexamples after 36 years of use and searching. We expect they exist, but are very rare (there are good reasons why). Some people are more pessimistic and add one or more M-R tests on top.

Note that your test is not $O(\log n)$, as modular exponentiation is at least $O(\log^2 n)$, so your algorithm is $O(\log^3 n)$. It is similar to Miller's test, which is $O(\log^4 n)$ because the lowest current limit on the number of bases needed is $2\log^2 n$. It has the advantage of a lot more rigorous math showing that the test is absolutely correct, if the generalized Riemann Hypothesis is correct. Proving the Riemann Hypothesis has been giving us a wee bit of bother for some time however. In practice it turns out that even if one assumes this, we have faster implementations for most input sizes we care about anyway.

AKS is of course important in that it shows deterministic bounds of $O(\log^6 n)$ for all inputs. But in practice it's very slow, and not used. Constants matter, as does not ignoring all the caveats. For instance, APR-CL has a complexity of $O(\log^{c\log\log\log n} n)$, which is not polynomial -- asymptotically AKS is better. But $\log\log\log n$ grows so slowly that we can just replace it with the constant 3 for any number we'd consider computing, and now it's obvious that it will be faster than AKS for every input we'd ever use. Which, it turns out, is just what the empirical data shows. That ECPP internally uses randomness doesn't hinder us and we see approximately $O(\log^4 n)$ in practice, even though we don't have rigorous proofs of this.

0
On

A composite number has a chance of $50$% to pass the given test for some random number $a$. So, if a number passes $20$ tests, the probability is very high that the number is prime.

Even better is the strong pseudoprime-test based on fermat's little theorem. It can be shown that at most $25$% of the bases coprime to the given number will let a composite number pass the test, so with enough tests, the primilaty can be virtually guaranteed. If the number fails such a strong-pseudoprime test, it must be composite.