Storing a natural number as a set of its Nth prime factors, how much data is used?

Question

Storing a natural number as a set of its Nth prime factors, how much data is used?

1.1k Views Asked by Bumbble Comm At 01 Apr 2026 - 12:10

Spoiler, tap to reveal.

In the answers, DanaJ demonstrates encoding natural numbers, as a set of Nth prime factors, as described in the question below, taking about $1.2$ to $1.5$ times the bits as using a more straightforward single binary number encoding.

A natural number can be stored as its prime factors, for example:

$10 = 2*5 = product(2, 5)\\12 = 2*2*3 = product(2, 2, 3)\\13 = 13 = product(13)$

And it's prime factors, being prime numbers, can be stored as the "position" of the prime that they are.

$p(1) = 2\\p(2) = 3\\p(3) = 5\\p(4) = 7\\p(5) = 11\\p(6) = 13$

Therefore: (Let $C$ be the name of a new function)

$10 = 2*5 = p(1) * p(3) = product(p(1), p(3)) = C(1, 3)\\12 = 2*2*3 = p(1) * p(1) * p(2) = product(p(1), p(1), p(2)) = C(1, 1, 2)\\etc\ldots$

The density of primes decreases with magnitude, larger adjacent primes being spaced further apart.

For example, the primes between $10$ and $20$ are $11, 13, 17$, and $19$, but $1000$ to $1010$ contains just one prime, $1009$.

However larger numbers tend to have more factors.

Question:

Typically, roughly how many times more data should it take to store numbers as described, than in the usual single binary number way?

Is this approach more efficient for larger numbers?

Following is an example with a 2 digit, and a 4 digit number.

I've included roughly how many binary bits could be required for this in square brackets. This is probably an underestimate as it doesn't count everything such as storage for the lengths of the numbers, or the storage for the length of the list of numbers. I'll use $n$ to represent this additional cost.

I'll represent for example $C(2, 3, 3, 6)$ as $C(+1, +1, +0, +3)$, this works because consecutive values never decrease.

.

$35\ [5\text{ bits}] = C(+2, +1)\ [2+1+n\text{ bits}]\\1822\ [11\text{ bits}] = C(+0, +155)\ [0+8+n\text{ bits}]$

Original Q&A

There are 4 best solutions below

Bumbble Comm On 29 Jun 2015 - 2:01

You can't get something for nothing. There are $2^n$ binary strings of length $n$, so you can't use them to represent more than $2^n$ different integers. Therefore $-$ necessarily $-$ if your encoding is more efficient for some integers, it must be correspondingly less efficient for other integers.

Bumbble Comm On 29 Jun 2015 - 2:05

TonyK has shown that in general no encoding scheme can compress some strings while not expanding any string. But there is another problem with your idea that he didn't mention. You did not count the bits necessary to separate between the indices in your representation. They must be counted, otherwise you are hiding information in the position of the separators (you used commas in your question). Also, if you want to encode arbitrary natural numbers with asymptotically optimal bit length, you need some kind of recursive encoding such as Elias delta coding. This coding is prefix-free and hence does not need separators.

Bumbble Comm On 12 Apr 2016 - 4:36

I have interest in this theory for a while

What I'm thinking about now is, it could be possible to have many level of product store as number

I mean, for example we could represent 1787 as prime(277)

And 277 is also prime at index 59 so we could said 1741 is prime(prime(59))

or 2411 is prime at index 358 and 358 is 2 * 179 so we could write prime(2 * 179) And 179 is prime so maybe we could write it as prime(2 * prime(41))

What we could store is 2:[52] for 1787 and 1:[2,1:[41]] for 2411

If we find some cap number that, if the number is more than cap number it will recursively reduce in this manner until it small enough It could be able to encode megabyte of data serialize into this manner smaller than the data?

**Bumbble Comm** · Accepted Answer

I wrote some Perl modules that relate to this, so I thought I'd try them out. Use cpan ntheory Data::BitStream Data::BitStream::XS then:

#!/usr/bin/env perl
use warnings;
use strict;
use ntheory qw/:all/;
use Data::BitStream;

my($s1,$s2,$s3) = map { Data::BitStream->new } 1..3;

my $k=2;
for my $n (1 .. 1_000_000) {
  $s1->put_arice($k,$n);
  $s2->put_delta($_-1) for factor($n),1;
  $s3->put_delta(prime_count($_)) for factor($n),0;
}
print $s1->len, "  length using adaptive rice for n\n";
print $s2->len, "  length using delta encoding of factors\n";
print $s3->len, "  length using delta encoding of factor indices\n";

$_->rewind_for_read for $s1, $s2, $s3;

$k = 2;
while (!$s1->exhausted) {
  my $n = $s1->get_arice($k);
  my @l; push @l, $_+1 while ($_ = $s2->get_delta) != 0;
  die "Stream s2 doesn't match: $n\n" unless $n == vecprod(@l);
  my @m; push @m, nth_prime($_) while ($_ = $s3->get_delta) != 0;
  die "Stream s2 doesn't match: $n\n" unless $n == vecprod(@m);
}
print "Streams match\n";

I use a terminator between values so we know when the factors for this number have stopped.

We can use get/put_binword(<bits>, $n) to store data in a fixed number of bits. There are lots of variable-length codes to choose from. I compare binary (simple, but either relies on knowing an upper limit or wasting space), adaptive Rice encoding the number, and using Elias Delta encoding of the factor indices. Fibonacci and Boldi-Vigna-2 codes seem to be a bit more efficient (e.g. 322M bits vs. 339M bits for 1-10M). There are other tricks that can reduce the number of bits as well (e.g. encode the number of '2' factors using unary).

In bits, for the first 100k numbers I get:

1700000   binary
1785341   adaptive Rice
2585056   delta encoding factor indices (1.52x binary)

For 1-1,000,000:

21000000  binary
21165029  adaptive Rice
29860120  delta encoding factor indices (1.42x binary)

For 1-10,000,000:

240000000 binary
245319979 adaptive Rice
339117302 delta encoding factor indices (1.41x binary)

For 2^32-1M to 2^32-1:

32000000 binary
35000370 adaptive Rice
45717188 delta encoding factor indices (1.43x binary)

For 2^36-1M to 2^36-1:

36000000 binary
39000531 adaptive Rice
50254149 delta encoding factor indices (1.40x binary)

I believe we could lower this to ~1.34x using Fibonacci codes. Perhaps a taboo or start/stop code could be made to lower it further. As mentioned there are more complicated methods that can shave off even more bits.

Of course a big limitation here is the factoring, prime count, and nth prime. Factoring 64-bit numbers is pretty easy so this isn't taking much time. Exact prime counts aren't cheap however -- past 2^36 or so it gets pretty expensive (prime count for 2^32 is about 3 milliseconds; 2^36 is 14ms; 2^40 is 75ms). nth prime has a similar cost. It's really not a viable method past 2^50 or so.

This storage compares well to rounded-up binary storage (e.g. using 64 bits to store numbers between 1 and 10M), but that's not a fair comparison. We should look at an efficient encoding of the raw numbers, such as Gamma/Delta/Omega, adaptive Rice, etc. encoding, or using a tight bit bound.

Question: Typically, roughly how many times more data should it take to store numbers in this way than in the usual way, is this approach more efficient for larger numbers?

How many times more data: Somewhere between 1.2x and 1.5x a tight bit bound. It starts to look much better if you are rounding up to 32- or 64-bit per number where a lot of those bits are wasted.

Is it more efficient for larger numbers? It looks like it might be, but not by much (clarification: more efficient than the same method on smaller numbers, not more efficient than tightly storing the original number). The limitation of storing the prime index means "large" can't be very large.

Storing a natural number as a set of its Nth prime factors, how much data is used?

There are 4 best solutions below

Related Questions in PRIME-NUMBERS

Related Questions in PRIME-FACTORIZATION

Trending Questions

Popular # Hahtags

Popular Questions