So I was playing with Eratosthenes sieve when I noticed a pattern in the number of primes less than n. I start of with n numbers then I remove multiples of 2 (1/2 of all numbers) then I remove all multiples of 3 (1/3 of the remaining numbers) and so on for all primes less than √n. I figured that if I pretend to sieve n numbers I would have a fast way of approximating π(x).
Here is the pseudocode of my implementation.
def count_primes n
for p in sieve √n
n -= n/p
return n
I then tested counting up to 1000.
count_primes 1000
=> 157
The true number of primes < 1000 is 168. So I'm getting an underestimation wich makes sense since when I'm removing multiples of 2 I'm also removing the prime number 2. This is further supported by the fact that I'm off by 11 wich is the number of primes < √1000. I thought that this approximation would get better as n → ∞ because √n becomes insignificant.
I then tested counting up to 10^16.
count_primes 10^16
=> 304,797,218,820,068
This is an overestimation of about 9%, how is this even possible? It is not rounding errors since switching to floating point has negligable effects on the result. After a bit more testing I found that the overestimation starts at 35005 and gradually gets worse from there. That means it can't be the square root function since it has 15 digit precision.
Any way I try to reason around this I should be subtracting at least 1 more number than I would in an actual sieve. Am I missing something? Can anyone explain this to me?