Efficiently computing $\sum_{\sqrt{N} \lt p \in primes \leq N} \lfloor \frac{N}{p} \rfloor$

94 Views Asked by At

I'm interested in computing $f(N) = \sum_{\sqrt{N} \lt p \in primes \leq N} \lfloor \frac{N}{p} \rfloor$

$N$ is large enough that primes up to $\sqrt{N}$ are available, but not much beyond and certainly not to $N$.

There's some flexibility about the range, in that a $g(N)$ summing over $1 \lt p \in primes \leq N$ would be just as useful and give $f(N) = g(N)-g(\sqrt{N})$.

The most/only obvious improvement I can think of - compared with the completely infeasible prime search up to $N$ - is using the prime counting function $\pi(n)$: there are $\pi(N)-\pi(\frac{N}{2})$ primes contributing 1 to the sum, $\pi(\frac{N}{2})-\pi(\frac{N}{3})$ primes contributing 2 to the sum and so on. However while there seem to be practical implementations for computing $\pi(N)$ using only primes to $\sqrt{N}$ (e.g sympy's primepi), in practice the computational complexity means $\pi({N})$ still seems to be out of practical reach.

Is there any trick which will help compute this sum?

2

There are 2 best solutions below

0
On BEST ANSWER

As the comment thread under my question reveals, the fundamental issue here was that I just didn't understand how fast prime counting can be. Kudos to libprimecount which can do astonishing (to me) numbers like $10^{18}$ in a matter of seconds.

Using that, I was able to solve my problem easily.

Using the prime counting function $\pi(n)$, the sum in the question can be evaluated by

$\sum_{k=1}^{\sqrt{N}-1} k.(\pi( \lfloor {\frac{N}{k}} \rfloor ) - \pi( \lfloor \frac{N}{k+1} \rfloor ))$

or

$ - (\sqrt{N}-1).\pi( \lfloor \sqrt{N} \rfloor ) + \sum_{k=1}^{\sqrt{N}-1} \pi({ \lfloor \frac{N}{k}} \rfloor ) $

And as a check, some python code, just using sympy's prime tools, nothing fancy. bfS is the literal implementation of the question, and S and T implement the two forms above.

from sympy.ntheory.generate import primerange
from sympy.ntheory import primepi

def check(N):

    sqrtN=int(N**0.5)
    assert sqrtN**2<=N
    assert (sqrtN+1)**2>N

    # Bruteforce it
    bfS=sum(N//p for p in primerange(sqrtN+1,N+1))

    # Use primepi
    kpi={k:primepi(N//k) for k in range(1,sqrtN+1)}
    S=sum(k*(kpi[k]-kpi[k+1]) for k in range(1,sqrtN))
    assert S==bfS

    T=sum(kpi[k] for k in range(1,sqrtN)) - (sqrtN-1)*primepi(sqrtN)
    assert T==bfS

for e in range(1,7):
    check(10**e)
1
On

I suspect the fastest way to compute this sum exactly is to note that $$ \sum_{L<p\le N} \bigg\lfloor \frac Np \bigg\rfloor = \sum_{L<p\le N} \sum_{\substack{1\le k\le N \\ p\mid k}} 1 = \sum_{1\le k\le N} \sum_{\substack{L<p\le N \\ p\mid k}} 1. $$ (I use the parameter $L$ because the OP's title, where $L=1$, doesn't match the question asked in the body, where $L=\sqrt N$.)

Sums of factorization-related functions over all intervals in a range can be accomplished using the sieve of Eratosthenes with a little extra bookkeeping: instead of simply "crossing out" multiples $k$ of each prime $p$, add $1$ to a counter for each such multiple $k$ when $p>L$. One only has to use the primes up to $\sqrt N$, and then all the other prime factors can be noted and counted (there is at most $1$ for any integer in that range).

It seems that just using the standard sieve of Eratosthenes to compute all the primes up to $N$, and then evaluating the sum $\sum_{L<p\le N} \big\lfloor \frac Np \big\rfloor$, isn't that much slower. Both of them will take (space $N$ and) time that is something like $N(\log N)^C$.