Counting numbers smaller than $N$ with exactly $k$ *distinct* prime factors

751 Views Asked by At

Using common notation, $\omega(n)$ is the number of distinct prime factors on $n$. Similiarly, $\Omega(n)$ is the number of prime factors of $n$, not necessarily distinct: $120=2^{3}\cdot 3 \cdot 5$ , therefore $\omega(120)=3$ , $\Omega(120) = 5$.

If we let $W_k(N)$ be the count of numbers not exceeding $N$ with exactly $k$ distinct prime factors {$i | i \leq N , \omega (i) = k$}, is there any way or formulation to calculate it? Or at least a relatively coincise algorithm for it, which does not include factoring each number and checking for $\omega$ individually?

For example, one can easily find that $W_3(100)=8$ , $W_3(1000000)=379720$ , $W_4(1000000)=208034$.

Please note here I am not looking for an asymptotic approximation but rather an exact count. And $N$ can be large, with order of magnitude of $10^{12}$ and over.

I have found that a solution exists for the $\Omega$ counterpart, known as k-Almost Primes, which admits closed-form formulas, but as for $\omega$, I have found non.

3

There are 3 best solutions below

2
On

I think Karl above has a good start to an algorithmic answer here. But I don't think you even look at divisibility or inclusion-exclusion. I think you actually find every value of $i$ but in a faster way, without checking every $i < N$--just enumerate the possible values of $\{i \mid \omega(i) = k\}$ via multiplication. For the sake of an algorithm, I think you treat the factorizations in their exponent array/vector form; for instance

$$1260 = 2^2\cdot3^2\cdot 5^1\cdot7^1 = [2,2,1,1] \\43472 = 2^4\cdot3^0\cdot 5^0\cdot7^0\cdot11^1\cdot13^1\cdot 17^0\cdot19^1 = [4,0,0,0,1,1,0,1] $$

I'm not an algorithm-optimization person. But here's what I'd do (for $k=4$), because I think it gets you to stop points in the shortest time.

Start with the initial array $[a_2, a_3, a_5, a_7] = [1,1,1,1] = 2\cdot3\cdot5\cdot7 = 210$. Determine the number of different lists of values where all of the elements are $1 \leq a_i \leq 2$ and at least one value is $a_i = 2$. These are: one $2$, two $2$s, three $2$s, and four $2$s.

For each of those possible lists, generate all the permutations, in reverse lexicographical order. This would get you (eliminating brackets and commas for space reasons):

$$2111, 1211, 1121, 1112; 2211, 2121, 2112, 1221, 1212, 1122; 2221, 2212, 2122, 1222; 2222$$

At each step, of course, generate the actual numbers:

$$420, 630, 1050, 1470;\\ 1260, 2100, 2940, 3150, 4410, 7350;\\ 6300, 8820, 14700, 22050, 44100$$

and add to a counter if the number is less than $N$. In each group you can stop when you hit a number that's too high.

Now do the same thing for $1 \leq a_i \leq 3$, with at least one value $a_i = 3$, and so on. You'll want some recursion, and also some recursion. For $n = 10^{16}$, for instance, at the very least $[46,1,1,1]$ and $[44,2,1,1]$ are possibilities.

But eventually you'll hit a point where none of the possibilities is less than $N$. I suggested reverse lexicographical order, as you should hit the smaller numbers earlier each time around. You'll never even try, for instance, $[10,10,10,10]$

Now you add another element to the array to represent $11$, and do the same thing, with the constraint that only four elements can be non-zero. Then add another element for $13$, etc. Eventually you should hit a point where nothing is under $N$, probably when your array looks like $[0,0,....,0,1,1,1,1]$ and the last four elements are the last four primes before the fourth root of $N$.

I may come back later and try to pseudocode this.

Edit: some code available here for lex order, there's code at the bottom for anti-lex order but only in Java, but I'm sure you can generate similar code in your personal favorite language.

7
On

The system asked if I really wanted to post another answer, but I really do feel like this belongs separately. Two things, actually. First, a simple (thought not algebraic) expression for $W_1(N)$, in terms of the prime counting function:

$$W_1(N) = \sum_{i = 1}^{\lfloor \log_2 N \rfloor} \pi \left(N^{\frac{1}{i}} \right)$$

$W_1(N)$ is just all the primes and their prime powers. Primes $\sqrt{N} \leq p$ will only have their first power on the list. Primes $\sqrt[3]{N} \leq p \leq \sqrt{N}$ will also have their second power on the list, and are counted twice, and so on to $i = \lfloor \log_2 N \rfloor$, for which $\pi(N^{\frac1i}) = 1$, that is, only $2$.


Second, a different take on algorithmic sieving. We start with:

$$|\{i | i \leq N , \omega (i) = k\}| = |\{i | i \leq N , \omega (i) \geq k\}| - |\{i | i \leq N , \omega (i) \geq k+1\}|$$

That is, for say $k=4$, if we find the set of integers where $\omega(i) \geq 4$, then remove everything that has $\omega(i) \geq 5$, what remains must have $\omega(i) = 4$. We do this because it means we can just use multiples without worrying about whether the multiples have specific factors. So for instance, $2\cdot3\cdot5\cdot7 = 210$ has four factors. Every multiple of it must have at least as many factors, in some cases many more.

I first thought "Hey we can just divide to find how many multiples there are," but that doesn't work--we end up counting the same thing multiple times. But then it struck me that the only numbers we'll count multiple times are the ones that have at least five distinct factors. So we only need to run this enumeration once, then remove all copies of any values that appear more than once in the list.

There are almost certainly optimizations to be done once you get to larger primes, see the above discussion. Right now my issue seems to be the time used to find and remove the multiples is dominating the other calculations. Weird. Will comment later with Sage code.

2
On
  • $k=0$

$$N_0(n) = \begin{cases} 0 & \text{if $n < 1$} \\ 1 & \text{if $n >= 1$} \end{cases}$$

Since $n=1$ is the only non-negative integer with $\omega(n) = 0$, this is simple. Any $n >= 1$ returns 1, otherwise return 0.

  • $k=1$

$$ \begin{align} N_1(n) & = \text{prime_power_count}(n) \\ & = \sum_p^n \sum_{x=1}^{\lfloor\log_{p}(n)\rfloor} 1\\ & = \sum_p^n \lfloor\log_{p}(n)\rfloor\\ & = \pi\left(\lfloor n^{1/2} \rfloor\right) + \sum_{p}^{\lfloor\sqrt n\rfloor} \lfloor\log_{p}(n)\rfloor - 1\\ & = \sum_{k=1}^{\lfloor\log_2(n)\rfloor} \pi\left(\lfloor n^{1/k}\rfloor\right) \end{align}$$

These are the prime powers. We can do a sum over the primes, then the prime powers, or just do the obvious thing and turn the prime power loop into a single integer log. The second to last equation is noting that after $\sqrt(n)$ all the logs are 1, so we can do a single prime count for all the $p^1$ cases, leaving us only having to examine the primes up to $\sqrt(n)$. We can continue this logic to its end and get the last equation which is a series of prime counts. This is what a number of software packages implement, and it is quite fast given a modern prime count method.

  • $k=2$

$$ \begin{align} N_2(n) & = \sum_{p}^{\lfloor\sqrt n\rfloor} \sum_{i=1}^{\lfloor\log_{p}(n)\rfloor} \sum_{q = p+1}^{\lfloor n/{p^i} \rfloor} \lfloor\log_{q}(\lfloor n/{p^i} \rfloor)\rfloor && \text{$p$ and $q$ over primes} \\ & = \sum_{p}^{\lfloor\sqrt n\rfloor} \sum_{i=1}^{\lfloor\log_{p}(n)\rfloor} \bigl( \pi\left(\lfloor n/{p^i} \rfloor\right) - \sum_q^p \lfloor\log_{q}(\lfloor n/{p^i} \rfloor)\rfloor \bigr) && \text{$p$ and $q$ over primes} \\ \end{align}$$

The first two sums are for summing over the prime powers. In an implementation it's still two loops, but a bit more efficient. We'd also calculate $n/{p^i}$ once rather than doing it over and over inside the innermost loop. The second version is much faster given a decent prime count method, because the final sum is over a small number of primes rather than many.

  • $k=3$

$$ \begin{align} N_3(n) & = \sum_{p}^{\lfloor n^{1/3} \rfloor} \sum_{i=1}^{\lfloor\log_{p}(n)\rfloor} \sum_{q=p+1}^{\lfloor n^{1/2} \rfloor} \sum_{j=1}^{\lfloor\log_{q}(n/{p^i})\rfloor} \bigl( \pi\left(\lfloor n/{(p^i p^j)} \rfloor\right) - \sum_r^q \lfloor\log_{q}(\lfloor n/{(p^i p^j} \rfloor)\rfloor \bigr) \\ \end{align}$$

Just showing the more efficient innermost loop here. $p$, $q$, and $r$ run over the primes. This looks more complicated than it really is, as again we have two sums indicating the prime powers. We are trying to count the $p^i q^j r^k <= n$. We are going to count in order, so $p < q < r$. So as we loop through the powers $p^i$ we then find $q^j r^k <= n/p^i$, and so on.

The solution for $k = 4$ should be clear at this point, as we just add another outer sum pair, and continue for higher $k$.

It is possible this can be written in a better style, especially turning the double sum for primes powers into a single one, though note we use the actual prime for the start of the next loop. It is also not unlikely this can be further simplified.


In C or Perl code, this turns into a sub-20 line recursive function that handles arbitrary k. Albeit this is with a library providing logint, rootint, prime_power_count, prime_count, and fast looping over primes.

As a timing example:

time say omega_prime_count(2,powint(10,9)):

5.9s  efficient ranged nfactor sieve
1.2s  first double summation method
0.03s second double summation method

time for n=10^10:  0.13 seconds.  (15s for the first, 61s for the sieve)

For $k=8$, I get omega_prime_count(8,10^8) = 719 in 66 microseconds, vs. a naive loop in Pari/GP taking 38 seconds. omega_prime_count(8,10^12) = 953640790 takes 2.8 seconds.


Some interesting papers: