Does Collatz Rule $3x+5$ have a bias for certain loops, or are my results faulty?

698 Views Asked by At

Not too long ago, I studied a modified Collatz rule where

$$f(x)= \begin{cases} 3x+5, & \text{if $x$ is odd} \\ x/2, & \text{if $x$ is even} \end{cases}$$

by observing the trajectories of $n$ with some code I wrote. The code would calculate the trajectory of each seed or starting number $n$ beginning with $1$ until the trajectory reached a loop. The code will then dump the loop into a spreadsheet and then repeat the process for $n+1$ until some defined limit for $n$ was reached. The resulting spreadsheet contains every starting number and the loops each of those numbers ended up in. I did not record the original trajectories in the spreadsheet.

In this Google Document, I created pie charts for the sample sizes 100, 1,000, 10,000, 100,000, and 1,000,000.

The results were made by defining some sample size up to some number, sorting all of the numbers based on what loop their trajectories entered, and then creating ratios for those relationships.

Here is a link to the raw data my code generated:

https://drive.google.com/drive/folders/0BzfYa_--3heeNkVpd1NPd090aDA?usp=sharing

(note: Viewing the 10,000 sample size worked just fine for me, however you would need to download the sample sizes 100,000 and 1,000,000 to view them)

The results show the percentages vary quite a bit from sample to sample, however in the general scheme of things the data seems to be somewhat consistent. For example, my data shows the 19 loop is the end of roughly half the trajectories of the numbers in the samples. Only one percentage never changed from sample to sample; unsurprisingly, the 20-10-5 loop consisted of 1/5 of all tested values.

I am unsure if this “loop bias” I observed is a consequence for relying on a sample size to begin with, human/code error, or if there is a mathematical explanation for what makes certain loops more popular than others. I have a few ideas for why some bias occurs, however I am not confident in them, mostly because my ideas heavily rely on speculation I do not know how to prove formally.

EDIT: Here are the loops in order of appearance:

  • [1, 8, 4, 2, 1]
  • [19, 62, 31, 98, 49, 152, 76, 38, 19]
  • [5, 20, 10, 5]
  • [23, 74, 37, 116, 58, 29, 92, 46, 23]
  • [187, 566, 283, 854, 427, 1286, 643, 1934, 967, 2906, 1453, 4364, 2182, 1091, 3278, 1639, 4922, 2461, 7388, 3694, 1847, 5546, 2773, 8324, 4162, 2081, 6248, 3124, 1562, 781, 2348, 1174, 587, 1766, 883, 2654, 1327, 3986, 1993, 5984, 2992, 1496, 748, 374, 187]
  • [347, 1046, 523, 1574, 787, 2366, 1183, 3554, 1777, 5336, 2668, 1334, 667, 2006, 1003, 3014, 1507, 4526, 2263, 6794, 3397, 10196, 5098, 2549, 7652, 3826, 1913, 5744, 2872, 1436, 718, 359, 1082, 541, 1628, 814, 407, 1226, 613, 1844, 922, 461, 1388, 694, 347]

EDIT 2:

I agree that smaller numbers may be responsible for skewing the data. Therefore, I picked the sample size 100,000 to 1,000,000 to test this theory. I uploaded the results to the original Google Doc with the other pie charts.

I was surprised to find, well the same chart. The ratios were slightly different as usual, but aside from that I am unsure to conclude if this test debunks the hypothesis or iterates the small number problem. I could try different sample sizes, however I do not know if that is a good idea or not.

To provide some insight on what I think is going on, I will show you a digital version of some notes I sketched and explain where my speculations came from.

In May, I drew some sketches of trees and made some speculations about what I observed. I assumed if a loop had a branch or a tail coming from the original even numbers in the loop, then the loop would connect to more numbers. I also assumed smaller even multiples (if $n$ is odd, then an even multiple is $n*2^a$, where $a$ is any value) branching to multiples of three "restricted" the size of the loops.

Of course, none of these statements are objective, much less provable. I wanted to share them in case there were any interesting mathematical patterns occurring or if this information shed light on anything...

Here is a digital version of my sketches.

Note: the trees are built using the "reverse Collatz method" or "${(n-1)}/{3}$, or in this case, an adapted version of that method. To divide $n$ by 2, go one number left. To multiply $n$ by 3 and add 5, find the bottom of the "T", which points to the next even number.

Warning: I showed this to a friend and the tree sketch confused them. If you find this sketch confusing, let me know and I will re-draw the entire thing with arrows instead.

Key:

  • If an even number branches, It will have a "T" above it. The first odd number on the "T" is the resulting odd number after applying ${(n-5)}/{3}$. The following even numbers are the even multiples of the odd number. (ex. In the 19 loop, 38 will have a "T" over it. 11 is the resulting odd number, and the even numbers after 11 are $11*2^1$, $11*2^2$, $11*2^3$, ...
  • Blue numbers are members of a loop.
  • Red numbers followed by a "no" sign are multiples of 3.
  • Purple "T"s connect the loop.
  • Green "T"s emphasize the extra "tail" or branch.
  • Orange "T"s emphasize where a tail could have been, but the number branched to a multiple of 3 instead.
  • Arrows connect the separated ends of the loop.
  • "..." are used to convey numbers not shown.

I color-coded the sketch to draw attention to certain properties. I figured it would make it easier to understand.

2

There are 2 best solutions below

0
On

this is not yet an answer, just a comment for more illustration

Part 1 - Table of some properties of the $6$ known cycles.

update - a somewhat longer exposition and a longer table at my homepage

  • First column: $a_\min$ as the smallest element of a cycle
  • second column: relative frequency of the type of cycle as tail of a trajectory. Only odd numbers $a_1=1$ to $a_1=999 999$ were tested.
  • Third column: $N$ is length of the cycle (odd numbers $a_k$ only are counted)
  • Foruth column: $S$ is sum-of-exponents $A_k$ (see below) or "number-of-halving-steps"
  • Fifth column: let the transferfunction be defined between odd numbers $a \to b$. Then $b = (3a+5)/2^A$. The given vector is the vector of exponents $A_k$ of a trajectory of length $N$.
    For a certain length $N$ more than one vector can be possible - not only by rotation (which gives cyclically the members of one cycle) but also besides of the pure rotations by other combinations having the same $(N,S)$ which gives then true different cycles.

Table 3x+5:

a_min   rel freq% N  S  vector of exponents
----------------------------------------
  5     20.0000   1  2  [2]            "trivial cycle"
  - - - - - - - - - - - - - - - - - - - 
  1     14.0844   1  3  [3]           because 3*1+r=8=2^3 -> 1

 19     49.6780   3  5  [1, 1, 3]
 29      9.2606   3  5  [2, 1, 2]

187      3.2618  17 27  [1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 3, 2, 1, 1, 1, 5]
347      3.7152  17 27  [1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 4, 1, 2, 1, 2, 2]

Note that a very similar structure occurs for $3x+7$ and $3x+13$ and etc. problem. For instance for the $3x+13$ we get the following table

Table 3x+13:

  a min   relfreq%  N    S   vector                     
  --------------------------------------------------------------------- 
    13  7.692000    1    2  [2]               "trivial cycle"
  ---------------------------------------------------------- 
     1 47.550000    1    4  [4]               // 3*1 + r = 2^4 -> 1 
   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   - 
   211  3.334000    5    8  [1, 1, 1, 1, 4]   // 2^8 - 3^5 = 13 = r 
   259  3.934000    5    8  [1, 1, 1, 3, 2] 
   227  1.880000    5    8  [1, 1, 1, 2, 3] 
   287  4.380000    5    8  [1, 2, 1, 1, 3]  
   251  1.958000    5    8  [1, 1, 2, 1, 3] 
   283  2.506000    5    8  [1, 1, 2, 2, 2] 
   319  1.424000    5    8  [1, 2, 1, 2, 2] 

   131 25.342000   15   24  [1, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 5] 


Part 2 - approching the problem of different relative frequencies

I'm looking at the backwards-tree starting at the cycle-element $19$ vs. that starting at the cycle-element $29$ Here the key for the greater frequency of the occurence of the $19$-cycle seems to be that the backwards-transformations cover the smaller (odd) numbers compared with that of the $29$-cycle - which means, that the smaller numbers transform to the $19$-cycle compared with the $29$-cycle by the $(3x+5)/2^A$-map. I cannot really formalize this for the relative frequencies below some fixed upper bound $N$ at the moment, but it might give some good intuition...

The representation in a line is as follows:

a_parent   [vector A]

vector $A$ is here the (infinite) vector of all numbers $a_k$ going down to $a_\text{parent}$ by one transformation: $ a_\text{parent}=(3a_k+5)/2^B$ . Of course each element of $A$ (except that which are divisible by $3$) are parents of another vector $AA$. The first couple of this entries are documented then i the following line, indented by some more spaces.

I printed that recursive tree, which has also a cyclic subtree of recursion-cycle of $3$, to a depth to $5$ . To focus the aspect of containing many small numbers, I omitted parents larger than $1000$ and also their subtrees although this might not be perfectly correct, because they can have parents themselves which asre smaller than $1000$ - but I left this aside.

The tree based on cycle-element $19$ has many more small numbers than the tree bases on cycle-element $29$.


mytree(19,5)

 31 [19, 81, 329, 1321, 5289, 21161, "..."]
    19 [11, 49, 201, 809, 3241, 12969, "..."]
        11 [13, 57, 233, 937, 3753, 15017, "..."]
            13 [7, 33, 137, 553, 2217, 8873, "..."]
                7 [3, 17, 73, 297, 1193, 4777, "..."]
                    17 [21, 89, 361, 1449, 5801, 23209, "..."]
                    73 [47, 193, 777, 3113, 12457, 49833, "..."]
                    ... [ ...]
                137 [181, 729, 2921, 11689, 46761, 187049, "..."]
                    181 [119, 481, 1929, 7721, 30889, 123561, "..."]
                    ... [ ...]
                553 [367, 1473, 5897, 23593, 94377, 377513, "..."]
                    367 [243, 977, 3913, 15657, 62633, 250537, "..."]
                    ... [ ...]
                ... [ ...]
            233 [309, 1241, 4969, 19881, 79529, 318121, "..."]
                ... [ ...]
            937 [623, 2497, 9993, 39977, 159913, 639657, "..."]
                623 [829, 3321, 13289, 53161, 212649, 850601, "..."]
                    829 [551, 2209, 8841, 35369, 141481, 565929, "..."]
                    ... [ ...]
                ... [ ...]
            ... [ ...]
        49 [31, 129, 521, 2089, 8361, 33449, "..."]
            31 [19, 81, 329, 1321, 5289, 21161, "..."]
                19 [11, 49, 201, 809, 3241, 12969, "..."]
                    11 [13, 57, 233, 937, 3753, 15017, "..."]
                    49 [31, 129, 521, 2089, 8361, 33449, "..."]
                    809 [1077, 4313, 17257, 69033, 276137, 1104553, "..."]
                    ... [ ...]
                329 [437, 1753, 7017, 28073, 112297, 449193, "..."]
                    437 [581, 2329, 9321, 37289, 149161, 596649, "..."]
                    ... [ ...]
                ... [ ...]
            521 [693, 2777, 11113, 44457, 177833, 711337, "..."]
                ... [ ...]
            ... [ ...]
        809 [1077, 4313, 17257, 69033, 276137, 1104553, "..."]
            ... [ ...]
        ... [ ...]
    329 [437, 1753, 7017, 28073, 112297, 449193, "..."]
        437 [581, 2329, 9321, 37289, 149161, 596649, "..."]
            581 [773, 3097, 12393, 49577, 198313, 793257, "..."]
                773 [1029, 4121, 16489, 65961, 263849, 1055401, "..."]
                    ... [ ...]
                ... [ ...]
            ... [ ...]
        ... [ ...]
    ... [ ...]

mytree(29,5)

 23 [29, 121, 489, 1961, 7849, 31401, "..."]
    29 [37, 153, 617, 2473, 9897, 39593, "..."]
        37 [23, 97, 393, 1577, 6313, 25257, "..."]
            23 [29, 121, 489, 1961, 7849, 31401, "..."]
                29 [37, 153, 617, 2473, 9897, 39593, "..."]
                    37 [23, 97, 393, 1577, 6313, 25257, "..."]
                    617 [821, 3289, 13161, 52649, 210601, 842409, "..."]
                    ... [ ...]
                121 [79, 321, 1289, 5161, 20649, 82601, "..."]
                    79 [51, 209, 841, 3369, 13481, 53929, "..."]
                    ... [ ...]
                ... [ ...]
            97 [63, 257, 1033, 4137, 16553, 66217, "..."]
                257 [341, 1369, 5481, 21929, 87721, 350889, "..."]
                    341 [453, 1817, 7273, 29097, 116393, 465577, "..."]
                    ... [ ...]
                ... [ ...]
            ... [ ...]
        617 [821, 3289, 13161, 52649, 210601, 842409, "..."]
            821 [1093, 4377, 17513, 70057, 280233, 1120937, "..."]
                ... [ ...]
            ... [ ...]
        ... [ ...]
    121 [79, 321, 1289, 5161, 20649, 82601, "..."]
        79 [51, 209, 841, 3369, 13481, 53929, "..."]
            209 [277, 1113, 4457, 17833, 71337, 285353, "..."]
                277 [183, 737, 2953, 11817, 47273, 189097, "..."]
                    737 [981, 3929, 15721, 62889, 251561, 1006249, "..."]
                    ... [ ...]
                ... [ ...]
            841 [559, 2241, 8969, 35881, 143529, 574121, "..."]
                559 [371, 1489, 5961, 23849, 95401, 381609, "..."]
                    371 [493, 1977, 7913, 31657, 126633, 506537, "..."]
                    ... [ ...]
                ... [ ...]
            ... [ ...]
        ... [ ...]
    ... [ ...]


An even more intuitive tree is the same tree but values taken to $\log_2()$. The progression in the vectors in then nearly linear, and the values of the $19$-tree seem a bit more dense than that of the $29$-tree if we select a window of values with a fixed lower and upper bound. But I don't want to suggest that this impression is already objective and could already answer your question!

mytreelog(19,5)

 4.95 [4.25, 6.34, 8.36, 10.4, 12.4, 14.4]
    4.25 [3.46, 5.61, 7.65, 9.66, 11.7, 13.7]
        3.46 [3.70, 5.83, 7.86, 9.87, 11.9, 13.9]
            3.70 [2.81, 5.04, 7.10, 9.11, 11.1, 13.1]
                2.81 [1.58, 4.09, 6.19, 8.21, 10.2, 12.2]
                    4.09 [4.39, 6.48, 8.50, 10.5, 12.5, 14.5]
                    6.19 [5.55, 7.59, 9.60, 11.6, 13.6, 15.6]
                7.10 [7.50, 9.51, 11.5, 13.5, 15.5, 17.5]
                    7.50 [6.89, 8.91, 10.9, 12.9, 14.9, 16.9]
                9.11 [8.52, 10.5, 12.5, 14.5, 16.5, 18.5]
                    8.52 [7.92, 9.93, 11.9, 13.9, 15.9, 17.9]
            7.86 [8.27, 10.3, 12.3, 14.3, 16.3, 18.3]
            9.87 [9.28, 11.3, 13.3, 15.3, 17.3, 19.3]
                9.28 [9.70, 11.7, 13.7, 15.7, 17.7, 19.7]
                    9.70 [9.11, 11.1, 13.1, 15.1, 17.1, 19.1]
        5.61 [4.95, 7.01, 9.03, 11.0, 13.0, 15.0]
            4.95 [4.25, 6.34, 8.36, 10.4, 12.4, 14.4]
                4.25 [3.46, 5.61, 7.65, 9.66, 11.7, 13.7]
                    3.46 [3.70, 5.83, 7.86, 9.87, 11.9, 13.9]
                    5.61 [4.95, 7.01, 9.03, 11.0, 13.0, 15.0]
                    9.66 [10.1, 12.1, 14.1, 16.1, 18.1, 20.1]
                8.36 [8.77, 10.8, 12.8, 14.8, 16.8, 18.8]
                    8.77 [9.18, 11.2, 13.2, 15.2, 17.2, 19.2]
            9.03 [9.44, 11.4, 13.4, 15.4, 17.4, 19.4]
        9.66 [10.1, 12.1, 14.1, 16.1, 18.1, 20.1]
    8.36 [8.77, 10.8, 12.8, 14.8, 16.8, 18.8]
        8.77 [9.18, 11.2, 13.2, 15.2, 17.2, 19.2]
            9.18 [9.59, 11.6, 13.6, 15.6, 17.6, 19.6]
                9.59 [10.0, 12.0, 14.0, 16.0, 18.0, 20.0]

mytreelog(29,5)

 4.52 [4.86, 6.92, 8.93, 10.9, 12.9, 14.9]
    4.86 [5.21, 7.26, 9.27, 11.3, 13.3, 15.3]
        5.21 [4.52, 6.60, 8.62, 10.6, 12.6, 14.6]
            4.52 [4.86, 6.92, 8.93, 10.9, 12.9, 14.9]
                4.86 [5.21, 7.26, 9.27, 11.3, 13.3, 15.3]
                    5.21 [4.52, 6.60, 8.62, 10.6, 12.6, 14.6]
                    9.27 [9.68, 11.7, 13.7, 15.7, 17.7, 19.7]
                6.92 [6.30, 8.33, 10.3, 12.3, 14.3, 16.3]
                    6.30 [5.67, 7.71, 9.72, 11.7, 13.7, 15.7]
            6.60 [5.98, 8.01, 10.0, 12.0, 14.0, 16.0]
                8.01 [8.41, 10.4, 12.4, 14.4, 16.4, 18.4]
                    8.41 [8.82, 10.8, 12.8, 14.8, 16.8, 18.8]
        9.27 [9.68, 11.7, 13.7, 15.7, 17.7, 19.7]
            9.68 [10.1, 12.1, 14.1, 16.1, 18.1, 20.1]
    6.92 [6.30, 8.33, 10.3, 12.3, 14.3, 16.3]
        6.30 [5.67, 7.71, 9.72, 11.7, 13.7, 15.7]
            7.71 [8.11, 10.1, 12.1, 14.1, 16.1, 18.1]
                8.11 [7.52, 9.53, 11.5, 13.5, 15.5, 17.5]
                    9.53 [9.94, 11.9, 13.9, 15.9, 17.9, 19.9]
            9.72 [9.13, 11.1, 13.1, 15.1, 17.1, 19.1]
                9.13 [8.54, 10.5, 12.5, 14.5, 16.5, 18.5]
                    8.54 [8.95, 10.9, 13.0, 15.0, 17.0, 19.0]
8
On

The reason you see the same proportions converging to each cycle, irrespective of the portion of the natural numbers you sample is as follows:

Each cycle has a fixed number of tributaries or branches which is enumerated by the odd numbers in that loop. For example $1,8,4,2,1$ has one branch. Although each branch is a component of a loop, if followed in reverse it also extends upwards infinitely and receives infinitely many further incoming branches.

Every odd number $x$ can be viewed as the root of a branch extending upwards containing the numbers $x,2x,4x,8x,16x,32x\ldots$. It is the number of those branches extending upwards through the integers, and the frequency in which they in turn have further branches, which govern the proportion of integers in any given range which converge to the given cycle.

Each of those branches, depending on its value $\equiv\mod 3$, will receive incoming branches at certain points. I'm not familiar with the $3x+5$ structure but the rules are generally the same as the conventional conjecture, in which all the odd numbers which are $\equiv0\mod 3$ receive no such incoming branches, those $x\equiv 1\mod 3$ receive incoming branches to every even power of $2$ (i.e. $4x,16x,64x,\ldots$) and those $x\equiv 2\mod 3$ receive branches on every odd power of $2$.

The sub-branches of every branch are distributed in exactly equal proportions between 0,1,2 mod 3 and therefore the sub-branches of every branch within a loop increase in equal proportions as you ascend through the integers.

Once you rise to a level within the integers where you are above all of the cycles, you should therefore find that the proportion of branches converging to each cycle is determined by a) the order of that cycle measured by the number of odd numbers it contains and b) the proportions of the numbers converging to it, which receive branches (as per the example I have given above for the conventional CC).