Why do the odd numbers in a Collatz trajectory hit so many Primes?

219 Views Asked by At

I implemented the Collatz procedure in C++ and simulated all numbers up to n (I did it up to 100000). The program counted the total number of primes and the total number of composites hit (only out of the odd numbers). Additionally, it calculates the average ratio of primes hit during the procedure for every number up to n. It seemed strange to me, that even at the scale of 100000, nearly half of all odd numbers hit were primes. Is there any reason for this? Here is my code.

#include <bits/stdc++.h>
using namespace std;

vector<bool> is_prime;

void prime_sieve(uint64_t n)
{
    is_prime = vector<bool>(n + 1, 1);

    for (size_t i = 2; i <= n; i++)
    {
        if (is_prime[i])
        {
            for (size_t j = 2 * i; j <= n; j += i)
                is_prime[j] = 0;
        }
    }
}

int main()
{
    prime_sieve(1000000000);
    uint64_t n;
    cin >> n;

    uint64_t primes = 0, composites = 0;
    double prime_ratio = 0.0;

    for (uint64_t i = 1; i <= n; i++)
    {
        uint64_t x = i, curr_primes = 0, curr_composites = 0;

        while (x != 1)
        {
            if (x & 1)
            {
                if (is_prime[x])
                    curr_primes++;
                else
                    curr_composites++;
                x = 3 * x + 1;
            }
            else
            {
                x /= 2;
            }
        }

        if (curr_primes || curr_composites)
            prime_ratio +=
                (double)curr_primes / (double)(curr_primes + curr_composites);

        primes += curr_primes;
        composites += curr_composites;
    }

    cout << primes << ' ' << composites << '\n'
         << prime_ratio / (double)n << '\n';
}
```
1

There are 1 best solutions below

1
On

There is a small mistake in the calculation of primes_ratio / n: you ignore n = power of two. But that error is only 20 in the numbers up to a million. Another small problem is that you are limited to 64 bits, so you will have no numbers above 2^64, which would be least likely to be prime (and when you ignore overflow some multiples of 3 sneak in).

The numbers that you check are never divisible by 2 or 3, so they are 3 times more likely to be primes. But if you start with a number of 1,000,000, each step multiplies it by 3/4 on average, so ln x has a linear distribution among the numbers you check. A random number around 10^6 is prime with probability 1 / ln x ≈ 1/13.8. Your numbers are odd and not divisible by 3, therefore prime with probability about 1/4.6. For your numbers, on an exponential scale, they are prime with probability 1/2.3.

The first number you examine is less likely to be prime, and the last numbers on the trajectory are the same for many numbers so “probabilities” don’t really work, so the actual results may be off a little bit.