An Alternative Formulation of Lucas-Lehmer Test

442 Views Asked by At

Is this alternative test faster than original Lucas-Lehmer test ?

Theorem (Lucas-Lehmer)

Let $M_p=2^p-1$ such that $p \in \mathbb{P}$ and $p \ge 3$ .

Let $S_i=S_{i-1}^2-2$ with $S_0=4$ , thus

$M_p$ is prime iff $S_{p-2} \equiv 0 \pmod {M_p}$

Theorem (alternative formulation)

Let $M_p=2^p-1$ such that $p \in \mathbb{P}$ and $p \ge 3$ .

Let $S_i=S_{i-1}^4-4S_{i-1}^2+2$ with $S_0=4$ , thus

$M_p$ is prime iff $S_{(p-1)/2} \equiv -2 \pmod {M_p}$


My PARI/GP implementation of alternative test is a little bit faster than PARI/GP implementation of original Lucas-Lehmer test , but I don't know if this comparison is relevant or not .

PARI/GP implementations :

MersennePrime1(p)=
{
S=4;
M=2^p-1;
ctr=1;
while(ctr<=p-2,
S=Mod(S^2-2,M);
ctr+=1);
S==0
}

MersennePrime2(p)=
{
S=4;
M=2^p-1;
ctr=1;
while(ctr<=(p-1)/2,
S=Mod((S^2-2)^2-2,M);
ctr+=1);
S==M-2
}
2

There are 2 best solutions below

0
On

TL;DR: The comparison isn't theoretically relevant. It may make enough of a difference in your setting to be practically relevant for you.

Since the only difference between your version and the standard algorithm is that you take the modulus only every second $S \mapsto S^2 - 2$ step - and therefore need to take one step more and have a slightly different condition at the end - there is no difference in algorithmic complexity. Both versions need $\Theta(p)$ squarings, subtractions and remainder operations, and the involved numbers have roughly the same size.

So the question is whether we get different constant factors making one or the other variant consistently faster.

Taking the modulus is typically a rather costly operation (although division can be implemented to have the same algorithmic complexity as multiplication, it is a much slower operation even then), so reducing the required number of such operations is a promising idea.

However, you pay for taking the modulus less often by dealing with larger numbers. In the standard algorithm, one never multiplies numbers with more than $p$ bits, while in your variant, one needs to multiply numbers with up to $2p$ bits, which takes longer. As a result, taking the modulus $S_i \bmod M_p$ will usually also take longer on average in your variant compared to the standard variant.

Which effect - fewer modulo operations vs. slower multiplication and modulo operations - wins in the end depends on low-level implementation details of the multiplication and modulo algorithms, probably also on CPU architecture, and possibly even the OS.

I've implemented both variants in C (using the GMP library), and in that implementation, the standard algorithm is consistently faster (for the tried primes $p$), some examples:

dafis@schwartz:~/Prime> ./gmp_mod_gcc 23209
2 ^ 23209 - 1 is a Mersenne prime.
lucas_lehmer took 3.173 seconds
2 ^ 23209 - 1 is a Mersenne prime.
ll_variant took 4.857 seconds
dafis@schwartz:~/Prime> ./gmp_mod_gcc 30011
2 ^ 30011 - 1 is not a Mersenne prime.
lucas_lehmer took 5.926 seconds
2 ^ 30011 - 1 is not a Mersenne prime.
ll_variant took 9.494 seconds
dafis@schwartz:~/Prime> ./gmp_mod_gcc 44497
2 ^ 44497 - 1 is a Mersenne prime.
lucas_lehmer took 16.421 seconds
2 ^ 44497 - 1 is a Mersenne prime.
ll_variant took 25.204 seconds
dafis@schwartz:~/Prime> ./gmp_mod_gcc 86243
2 ^ 86243 - 1 is a Mersenne prime.
lucas_lehmer took 83.717 seconds
2 ^ 86243 - 1 is a Mersenne prime.
ll_variant took 117.326 seconds

That's always a factor of approximately $1.5$. Since my C code looks quite different from the PARI/GP code, I've also implemented it in Haskell (your variant as an example)

ll :: Int -> Bool
ll p = go (p `quot` 2) 4
  where
    m = 2^p-1
    go 0 s = s + 2 == m
    go k s = go (k-1) (let t = s^2-2 in (t^2 - 2) `mod` m)

which looks a bit more like the PARI/GP code. The Haskell variants take approximately the same time as the corresponding C variants (unsurprisingly, since GHC uses GMP for Integer computations too). I don't have PARI/GP installed, so I can't check whether your MersennePrime2 would also be faster than MersennePrime1 on my box.

But our results together show that neither variant of the algorithm is unconditionally faster than the other.

However, so far only the standard mod was used, and for Mersenne numbers we can use a faster custom modulo computation (we know more about the algorithm than the library/compiler). That's not quite as fast as computing the remainder modulo $2^p$, which would be just one bit-masking, but we can still avoid all divisions and thus get a considerable speed-up. This custom modulo implementation plays especially nicely with the standard algorithm, since then $S < M_p^2$ always, and we know that we need at most one iteration of mask-and-shift. In C-ish pseudocode, that is

if (S > M_p) {
    S = (S & M_p) + (S >> p);
}
if (S >= M_p) {
    S = S - M_p;
}

where we need not do anything in the rare case that $S < 0$, since $S \geqslant -2$ and then $S^2$ is guaranteed to be very small in the next step. For the variant, we may need up to four mask-and-shift operations, it's simplest to then replace the first if with a while. This reduces the running time quite a bit, showing that the modulo operations were responsible for the bulk of the time:

dafis@schwartz:~/Prime> ./hand_mod_gcc 23209
2 ^ 23209 - 1 is a Mersenne prime.
lucas_lehmer took 0.833 seconds
2 ^ 23209 - 1 is a Mersenne prime.
ll_variant took 1.413 seconds
dafis@schwartz:~/Prime> ./hand_mod_gcc 30011
2 ^ 30011 - 1 is not a Mersenne prime.
lucas_lehmer took 1.446 seconds
2 ^ 30011 - 1 is not a Mersenne prime.
ll_variant took 2.581 seconds
dafis@schwartz:~/Prime> ./hand_mod_gcc 44497
2 ^ 44497 - 1 is a Mersenne prime.
lucas_lehmer took 3.706 seconds
2 ^ 44497 - 1 is a Mersenne prime.
ll_variant took 6.612 seconds
dafis@schwartz:~/Prime> ./hand_mod_gcc 86243
2 ^ 86243 - 1 is a Mersenne prime.
lucas_lehmer took 17.874 seconds
2 ^ 86243 - 1 is a Mersenne prime.
ll_variant took 32.475 seconds

Now the factor between the standard algorithm and your variant is approximately $1.8$ (except for $p = 23209$ where it's $1.7$), so the standard algorithm profited relatively more.

For the not-faint-of-heart, here's the C code:

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <time.h>

#include <gmp.h>

bool lucas_lehmer(unsigned long p);
bool ll_variant(unsigned long p);

int main(int argc, char *argv[]) {
    if (argc < 2) return 0;
    // I'm too lazy now to check that a valid argument is supplied,
    // bite the bullet if you don't.
    unsigned long p = strtoul(argv[1],NULL,0);
    clock_t start, stop;
    bool mersenne_prime;
    double duration;
    // time lucas_lehmer
    start = clock();
    mersenne_prime = lucas_lehmer(p);
    stop = clock();
    duration = (double)(stop - start)/CLOCKS_PER_SEC;
    printf("2 ^ %lu - 1 is%s a Mersenne prime.\n", p, mersenne_prime ? "" : " not");
    printf("lucas_lehmer took %.3f seconds\n", duration);
    // time the variant
    start = clock();
    mersenne_prime = ll_variant(p);
    stop = clock();
    duration = (double)(stop - start)/CLOCKS_PER_SEC;
    printf("2 ^ %lu - 1 is%s a Mersenne prime.\n", p, mersenne_prime ? "" : " not");
    printf("ll_variant took %.3f seconds\n", duration);
    return 0;
}

bool lucas_lehmer(unsigned long p) {
    mpz_t m, s;
    mpz_init2(m, p+66);
    mpz_init2(s, 2*p + 66);
#ifdef HAND_MOD
    mpz_t t;
    mpz_init2(t, p+66);
#endif
    mpz_setbit(m,p);
    mpz_sub_ui(m,m,1);
    mpz_set_ui(s,4);
    for(unsigned long u = p-2; u; --u) {
        mpz_mul(s,s,s);
        mpz_sub_ui(s,s,2);
#ifdef HAND_MOD
        if (mpz_cmp(s,m) > 0) {
            mpz_and(t,m,s);
            mpz_tdiv_q_2exp(s,s,p);
            mpz_add(s,s,t);
        }
        if (mpz_cmp(s,m) >= 0) {
            mpz_sub(s,s,m);
        }
#else
        mpz_mod(s,s,m);
#endif
    }
    bool res = mpz_sgn(s) == 0;
    mpz_clear(m);
    mpz_clear(s);
#ifdef HAND_MOD
    mpz_clear(t);
#endif
    return res;
}

bool ll_variant(unsigned long p) {
    mpz_t m, s;
    mpz_init2(m, p+66);
    mpz_init2(s, 4*p+66);
#ifdef HAND_MOD
    mpz_t t;
    mpz_init2(t, p+66);
#endif
    mpz_setbit(m,p);
    mpz_sub_ui(m,m,1);
    mpz_set_ui(s,4);
    for(unsigned long u = p >> 1; u; --u) {
        mpz_mul(s,s,s);
        mpz_sub_ui(s,s,2);
        mpz_mul(s,s,s);
        mpz_sub_ui(s,s,2);
#ifdef HAND_MOD
        while(mpz_cmp(s,m) > 0) {
            mpz_and(t,m,s);
            mpz_tdiv_q_2exp(s,s,p);
            mpz_add(s,s,t);
        }
        if (mpz_cmp(s,m) == 0) {
            mpz_set_ui(s,0);
        }
#else
        mpz_mod(s,s,m);
#endif
    }
    mpz_add_ui(s,s,2);
    if (mpz_sgn(s)) mpz_sub(s,s,m);
    bool res = mpz_sgn(s) == 0;
    mpz_clear(m);
    mpz_clear(s);
#ifdef HAND_MOD
    mpz_clear(t);
#endif
    return res;
}
0
On

I just saw this post and ran both in Pari/GP 2.9.1. My results varied a bit from DFischer, at least for the few I tried. For exp=44497, I saw orig/variant times as 48.8s/48.5s. For exp=86243, same order 4min23s/4min20s. My times are much slower due to my poor computing capabilities (laptop). The last I tried was exp=23209, yielding times 9.94s/9.83s. Why this didn't show the 1.5 factor seen in the other post I have no idea...