What does the word generate mean in terms of prime numbers?
For example, how would one find the two primes under 20 which generate the first four primes 2,3,5,7.
Or, how would one find the prime(s) generated by the alternate primes 2,5,7,11?
What does the word generate mean in terms of prime numbers?
For example, how would one find the two primes under 20 which generate the first four primes 2,3,5,7.
Or, how would one find the prime(s) generated by the alternate primes 2,5,7,11?
Copyright © 2021 JogjaFile Inc.
First, it helps to know why one would generate a prime. A common use is to generate a pair of large primes to use in an encryption algorithm, for example RSA. A bad way to do this is to guess a large number and then test to see if it is prime. A good way to do this is to start a search at some large number, throw out most of the composites quickly, then only test the numbers which remain for primality. (This is because primality testing is slow and sieving to discard "easy" composites is much, much faster.) A different need to generate primes is when one needs the list of primes in a certain range to compute some other thing. This can happen when picking primes for use in the Chinese Remainder Theorem ("CRT"), for example when secret sharing.
It is worth remembering that sieving methods that use a small list of primes to generate more primes also generate a lot of composites. A good analogy is that we wish we could vacuum form to include exactly the primes, but using only a finite starting set, we will include some composites. And as we go up through the survivors, the likelihood that the next generated number is composite increases.
One uses a pool of "small primes" to generate a pool of "larger, maybe primes". The easiest possible example of this is: $2$ is a small prime. If we eliminate all the multiples of $2$ bigger than $2$ from the natural nubmers, we eliminate all the "large" even numbers so that we cannot mistakenly pretend that one of them is prime. This actually works very well. The survivors larger than $2$ are $3, 5, 7, 9, \dots$, the first three of which are prime. (This answers your first question : $2$ is a prime that generates $2, 3, 5, 7$, so "$2$ and any other prime less than $20$" is a pair of primes that generates this list of four primes. Your second question, with the generating set $2, 5, 7, 11$ leaves multiples of $3$ as potential primes and removes multiples of $11$. It still "generates" all the primes but also leaves substantially more composites marked as "could be prime".)
We can extend this method quite a bit through wheel factorization. Let's do a small example first. Consider the six residues modulo $2 \cdot 3 = 6$. \begin{align*} 0 \pmod{6} &&\text{all divisible by $6$} \\ 1 \pmod{6} &&\text{no obvious divisor} \\ 2 \pmod{6} &&\text{all divisible by $2$} \\ 3 \pmod{6} &&\text{all divisible by $3$} \\ 4 \pmod{6} &&\text{all divisible by $2$} \\ 5 \pmod{6} &&\text{no obvious divisor} \end{align*} So if you wrap the natural numbers along a cylinder so that there are six number going around the circumference (so the residue classes modulo $6$ line up into long rows parallel to the axis of the cylinder), we know that the rows corresponding to $0$, $2$, $3$, and $4$ modulo $6$ contain no primes; all the primes are in the rows congruent to $1$ and $5$ modulo $6$. So the first few of these (stopping at the first composite) are $5, 7, 11, 13, 17, 19, 23, 25, \dots$.
There are two natural questions to ask now: Given a set of primes, what is the smallest composite not removed from the naturals? and how far can you go where "most" of the survivors are prime? Well, if we skip a prime in our generating set, then the square of that prime is a composite not removed from the naturals by our set of primes, so the square of the smallest prime not in our generating set is the smallest composite not removed from the naturals. The second question is harder.
If we put all the primes up to the $n^{\text{th}}$ prime, $p_n$, into our generating set, then the square of the next prime, $p_{n+1}^2$ is the first composite not removed from the list and the product of the next two primes, $p_{n+1}p_{n+2}$ is the second composite not removed from the list. These can be substantially larger than the largest prime in the generating set, so it can be quite a while before "most" of the numbers passing the sieve are composite. (One might wonder about bootstrapping this process -- where does our initial list of primes come from? From the empty generating set, we get that $2$ is prime (actually, that $2$ and $3$ are prime...) From that generating set, we get that $3$, $5$, and $7$ are prime and, since $9$ is the square of the first new prime, we stop. From these, we get all the primes up to $121$ and can continue as long as necessary to get large initial segments of the primes.)
Wheel sieving is not the only or the best way to use a short list of primes to generate more primes. The Atkin sieve starts by using the factorization $60 = 2^2 \cdot 3 \cdot 5$ to eliminate residue classes modulo $60$ that cannot contain primes, just as was done in the wheel sieve. Then some number theory is used to eliminate additional composites so that among the survivors, you do not mark off their multiples as you go, you mark off multiples of their squares. This requires vastly less marking, so runs much faster.
So how does this help us find large primes for RSA or a string of primes for the CRT. For RSA, decide how large a prime you want and make an array of numbers around this size. There are good estimates of how far apart prime numbers are so we know about how long to make this array. Then sieve out the multiples of $2$, $3$, $5$, $7$, and so on up to primes about the size of the length of the array. This leaves very few "maybe primes" to then run through a slow primality test. For the CRT task needing $N$ primes about the same size, make the array $N$ times larger, sieve out by small primes, then test the survivors for primality using a slow test.
(As a practical matter, it is frequently not necessary to have bona fide primes. It is sufficient to be "prime enough". This is definitely the case for th CRT application. You can instead, sieve out by small primes, then check the survivors for relative primality using Euclid's algorithm to compute the GCD. A collection that is pairwise relatively prime will work just as well as if they were prime -- and if you have sieved by all primes up to the length of the array, you know there cannot be a common prime factor among the survivors, so the GCDs can be skipped. This is less true for RSA -- RSA starts with the product of two large primes, a semiprime, that must satisfy additional tests. These additional tests comprise a second round of sieving, since we may need several primes to find ones that pass this second round, it makes sense to initially sieve a larger array, just to put off having to restart the initial sieve with an array representing the next block of natural numbers.)