How to make sense of this plot $x-\sum[(\text{oddsteps})\mod 3]$

288 Views Asked by At

I am trying to make sense of this graph $y=x-\sum\limits_1^x(i\mod 3)$ where $i$ is the number of odd steps for $x$ to reach $1$ in a Collatz sequence. x from 1 to 10^6 (plot of $x$ from $1$ to $9\cdot 10^6$)

The graph is to be interpreted as follow: as $x$ increase, if $i\equiv 2\mod3$ than $y$ goes down, if $i\equiv 1\mod3$ than $y$ stays the same, and if $i\equiv 0\mod3$ than $y$ goes up. The whole things seems to give a kind of sinusoidal.

The tendency is to grow, stabilize, then decrease, then stabilize...with a remarkable regularity which is baffling to me

Is there a known structure in the number of odd steps that can explain this repeating structure?

Here is the code to produce the graph:

Pari/GP

nbrOddTo1(n)=j=0;while(n>1,if(n%2==1,j+=1;n=(3*n+1)/2,n=n/2));return(j);
b=3;a=0;for(x=1,1000000,a=a+(b-1)/2-nbrOddTo1(x)%b;write("out.csv",round(a)));

Anaconda python

import pandas as pd
pldata = pd.read_csv('out.csv')
pldata.plot(figsize=(40,20),grid=True)

Here are the result for $y=\frac{b-1}{2}\cdot x-\sum\limits_1^x(i\mod b)$ for $b=4,5,6,7$, but only up to $10^6$ this time mod 4 mod 5 mod 6 mod 7 EDIT (didn't had much time yet, but here is a comparison with Eric observation up to $9\cdot 10^6$):

In blue
a=0;for(x=1,10000000,a=a+1-nbrOddTo1(x)%3;write("out.csv",a));
In red
a=0;for(x=1,10000000,a=a+1-ceil(5*log(x)/log(4/3))%3;write("out.csv",a));

enter image description here

1

There are 1 best solutions below

4
On

Using a probabilistic heuristic, the ratio of successive odd integers has a mean of $\frac{3}{4}$ so starting with an odd integer $x$, then $\bar{n} = $ the mean number of odd integers in a sequence to reach $1$ is given by: $$x\left(\frac{3}{4}\right)^\bar{n} = 1$$ so $$\bar{n} = \frac{\ln(x)}{\ln(\frac{4}{3})}$$ So we should have something like $$n = \bar{n} + noise$$ $n$ will have the average behavior of $\bar{n}$. The separation between zeroes, the separation between peaks, and the height of the peaks should grow something like $\ln(x)$. As $x$ increases there are increasing gaps between successive integer values of $\textit{round}(\bar{n})$ where your sum will tend to change in the direction of $(\textit{round}(\bar{n}) \textit{ mod } 3) - 1$.

The function $y \textit{ % } 3 - 1$ should have a mean of $0$ for random $y$. Summing this function acts as a smoothing or low-pass filter. If you just had a scatter plot of $y$ vs $x$ it will look like noise but by summing, the underlying average structure becomes visible. I assumed the structure was due to the average expected value of $nbrOddTo1(x)$ but it doesn't match up empirically and the frequency of change of $\bar{n}$ is a factor of $5$ to low. Here is a plot of your function (just for odd starting numbers which should presumably match closer to $\bar{n}$) overlayed with $\bar{n}\textit{ % }3 - 1$. enter image description here Here's a plot overlayed with overlayed with $(5 \bar{n}) \textit{ % }3 - 1$. This seems like a better match but I don't understand why. I'll check my code but there's no obvious factor of 5 lying around. enter image description here

The Collatz function $3x+1$ is analogous to a linear congruential pseudo random number generator. These are know to have all sorts of structure and fail miserably at tests of randomness such as a spectral test. See Knuth, The Art of Computer Programming Vol 2, chapter on Random Numbers.