Are the primes compressible?

Question

Are the primes compressible?

680 Views Asked by Bumbble Comm At 06 Apr 2026 - 10:51

Take a list of the first $n$ primes $P_n=\{2,3,5,7,11,\ldots\}$ and convert the sequence into a binary string

$$S_n = 101110111\ldots$$

Compress the string with your favorite compression algorithm (the details don't matter) and let the size of the string after compression be $C(S_n)$. Compare this to the average over permutations $\pi$ of the string $\left < C(\pi S_n) \right >$, or random strings of 0, or 1 of the same length $C(R_N)$. Why does it seem to be true that

$$ C(S_n) < \left < C(\pi S_n) \right > \approx C(R_N) $$

implying that primes are compressible!

For example, when $N=2^{21}$, using zlib as a compression algorithm and average over 20 trials:

$$ \frac{C(S_n)}{\left < C(\pi S_n) \right >} \approx .761 $$

What is this "redundant data"?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 29 Sep 2013 - 6:11

Primes are always odd, so the last bit is always 1 (except for the initial prime 2, of course). So there is definitely some redundancy.

Another consideration is that your description scheme (listing all the bits of every number) is not optimal. To describe any subset of $\{1,2,\ldots,n\}$ takes at most $n$ bits, since there are $2^n$ such subsets. We just have one bit corresponding to each number telling us whether that number is in or out of the subset.

For the particular subset you are considering (the primes), under the description scheme you are using (listing the bits in succession), there are about $\frac{n}{\ln n}$ elements (Prime Number Theorem), and each one has about $\log_2{n}$ bits. So the number of bits in your description is approximately $\frac{n}{\ln n}(\log_2{n}) = n (\log_2{e})\approx 1.44 n.$

Bumbble Comm On 26 Oct 2021 - 9:28

The first 23,163,298 primes can be considered compression-friendly. It is the maximum number of primes for which every gap between them is <= 255, i.e. fits into a single byte. So for those primes, you can use just a single byte per prime, storing the gap, rather than the actual prime as number, which consumes 8 bytes. This way, you end up using 8 times less memory.

I used this useful fact in my solution for caching primes that way.

UPDATE

This can be extended even further, to 303,371,455,241 - first prime with gap > 511, by taking advantage of the fact that all prime gaps are even numbers (except between 2 and 3), and so we can use bit 0 for storing bit 8, which in turn doubles the supported gap to 511, and thus increasing the range of primes significantly, to 303,371,455,241.

I implemented the complete 511 compression solution in my library.

**Bumbble Comm** · Accepted Answer

It turns out that storage of primes can be compressed by an arbitrary amount, though the storage needed is exponential in the compression factor.

Note: This has been corrected and been made more precise.

Let $p_n$ be the $n$-th prime (with $p_1 = 2$), and let $P_n$ be the product of the first $n$ primes, so that $P_1 = 2, P_2 = 6, P_3 = 30, P_4 = 210$.

If we do a sieve of Eratosthenes, sieving out multiples of the first $n$ primes, all that are left are the first $n$ primes and the numbers relatively prime to $P_n$.

For example, for $n=2$, the numbers left are $2, 3$, and the forms $6m+1$ and $6m+5$. For $n=3$, the numbers left are $2, 3, 5$ and the forms $30m+1,7,11,13,17,19,23,$ and $29$.

For general $n$, the numbers remaining are of the form $P_nm+q_i$, where the $q_i$ are the numbers from $1$ to $P_n-1$ relatively prime to $P_n$.

To illustrate, I will use the case $n=3$.

Each block of 30 numbers in a range $30m+1$ to $30m+29$ can have as most $8$ primes as shown above. Therefore, only one bit is needed for each of these $8$ possibilities, to indicate whether or not that value is actually prime. Therefore, storage of primes can be compressed by a factor of $\frac{8}{30} \approx 0.267$.

Here is what happens for general $n$.

The $P_n$ values in the range from $mP_n$ to $(m+1)P_n-1$ are compressed to $\phi(P_n)$ bits, where $\phi(m)$ is Eu;er's phi function (though he lets me use it) that counts the number of integers from $1$ to $m$ relatively prime to $m$.

Since $\phi(P_n) = \prod_{i=1}^n (p_i-1) $, the compression factor is $\dfrac{\phi(P_n)}{P_n} =\prod_{i=1}^n (1-\dfrac1{p_i}) $. Since this goes to zero (because $\sum_{i=1}^n \dfrac1{p_i} \sim \ln \ln n $ - this is Merten's theorem), the amount of compression can be arbitrarily large, though about $e^n/n$ bits are needed.

To see this, by one of the corollaries of the prime number theorem, $P_n \sim e^n$, and $\dfrac{\phi(P_n)}{P_n} =\prod_{i=1}^n (1-\dfrac1{p_i}) \approx e^{-\ln \ln n} = 1/\ln n $. Therefore, each block of $P_n$ values can be represented by about $\frac{P_n}{n}$ bits.

Therefore, using a sieve with the first $n$ primes and the numbers relatively prime to $P_n$ can compress the primes by about a factor of $n$.

Therefore the primes can be compressed by an arbitrary amount, though the amount of storage needed is exponential in $n$.

A number of years ago, I used this idea with $n=4$, so I got a compression of $\frac{1\cdot 2\cdot 4\cdot 6}{2\cdot 3\cdot 5\cdot 7} =\frac{8}{35} $.

Are the primes compressible?

There are 3 best solutions below

Related Questions in PRIME-NUMBERS

Related Questions in COMPUTATIONAL-COMPLEXITY

Trending Questions

Popular # Hahtags

Popular Questions