I have written some working code to generate RSA key pairs using the Miller-Rabin primality test. All of the generated keys pass the openssl rsa -check function, meaning that (among other things) openssl agrees that P and Q are both prime. For a 2048 bit key, P and Q are 1024-bit primes.
The Miller-Rabin algorithm is configured to run up to 64 rounds for numbers for which no witnesses for a composite candidate (c) are found. The first 14 rounds use fixed bases (the first 14 prime numbers), and the next 50 rounds use values from a PRNG in the range $2 \le c \le (2^k-2)$ where $k$ is the number of bits in the key
When a round using some base identifies $c$ as possibly prime, but a subsequent round using a different base finds $c$ to be composite, the candidate is what's known as a strong pseudoprime number. Bases which incorrectly identify $c$ as possibly prime are called liars, and this situation only happens when $c$ is a strong pseudoprime.
In analyzing the operation of my implementation of the Miller-Rabin algorithm, I notice that it has never come across any bases which are found to be liars. I have run over 30,000 prime number searches for 512 and 1024-bit prime numbers with nary a liar to be found.
If a strong pseudoprime value is encountered, several rounds with different bases may required to root it out, and the probability of a pseudoprime fooling $k$ rounds of Miller-Rabin is said to be $4^{-k}$. In the above example, with $k=64$, that amounts to a probability of $2^{-64} \approx 10^{-39}$, which is so small as to be zero.
But this left me wondering, why did the process never find a strong pseudoprime in over 30,000 runs, and is there any point if performing all those extra rounds if a strong pseudoprime is never going to pop up?
A common estimate of the number of primes less than some value, $x$ is this:
$$\Pi(x) = \frac{x}{\ln x}.$$
When working with RSA keys, $x$ is usually a large power of two less one, so we can approximate the fraction of those numbers less than $x$ which are prime. For $x=2^{1024}$, the fraction is $1/(\ln(2^{1024})) \approx 1/710$. When looking through random values to find a 1024-bit prime, it will take about 710 attempts on average. It's actually more than that because we're not accepting values less than $2^{1023}$, but this difference is not important for this discussion.
Since $x$ is such a large number, in computing probabilities it is computationally convenient to work with logarithms of these values.
$$\ln(x) = \ln(2^k - 1) \approx k \ln(2),$$
and
$$\ln \Pi(x) = \ln(x) - \ln( \ln(x) ).$$
There's an article by Carl Pomerance, "On the Distribution of Pseudoprimes" which gives an upper bound to the number of pseudoprimes less than "x", denoted by $P(x)$ in the article. Taking the logarithm of that formula yields this:
$$\ln( P(x) ) < \ln(x) - \frac{\ln(x) \ln( \ln( \ln(x) ) )}{2 \ln( \ln(x) )}.$$
Since the implemented prime search algorithm never ran across a strong pseudoprime, these formulas can answer the question: what is the ratio of the count of true primes less than $x$ to the count of strong pseudoprimes less than $x$?
$$\ln \frac{\Pi(x)}{P(x)} = - \ln( \ln(x) ) + \frac{\ln(x) \ln( \ln( \ln(x) ) )}{2 \ln( \ln(x) )} $$
The value of this ratio expressed as a power of 10 rather than as a natural logarithm for 512 and 1024 bit numbers is about $10^{21}$ and $10^{41}$ respectively. Strong pseudoprimes are so outnumbered by primes that the odds of ever finding one are essentially zero for key lengths currently used in RSA encryption.
For values of $x=2^k$ where $128 \le k \le 4096$ this function is nearly linear in $k$, with a 10-fold increase in the ratio for every 26 bits added to the number.
So, the first part of my question is: Is this analysis correct and accurate?
The second question is: what's the point in doing multiple rounds of Miller-Rabin when searching for primes of 1024 or more bits? The odds of ever picking a strong pseudoprime of this size so small as to be effectively zero.
Now, on the other hand, when searching for 1024-bit primes, if 710 (or maybe 1420) random values will have to be tested on average before finding a prime, the extra cost of running another 50 rounds of Miller-Rabin on a candidate that passes the first round is not all that much, so why not do that and be sure?
Being a little tongue in cheek, someone should offer a prize for the first person to publish a strong pseudoprime greater than $2^{1023}$, or $2^{2047}$. That would be a pretty safe bet wouldn't it? At least until some new technology like quantum compting enters the picture. Or put some code in openssl to flag the case where it finds a strong pseudoprime, proclaiming in effect, "we have a winner!"