Not too long ago, I studied a modified Collatz rule where
$$f(x)= \begin{cases} 3x+5, & \text{if $x$ is odd} \\ x/2, & \text{if $x$ is even} \end{cases}$$
by observing the trajectories of $n$ with some code I wrote. The code would calculate the trajectory of each seed or starting number $n$ beginning with $1$ until the trajectory reached a loop. The code will then dump the loop into a spreadsheet and then repeat the process for $n+1$ until some defined limit for $n$ was reached. The resulting spreadsheet contains every starting number and the loops each of those numbers ended up in. I did not record the original trajectories in the spreadsheet.
In this Google Document, I created pie charts for the sample sizes 100, 1,000, 10,000, 100,000, and 1,000,000.
The results were made by defining some sample size up to some number, sorting all of the numbers based on what loop their trajectories entered, and then creating ratios for those relationships.
Here is a link to the raw data my code generated:
https://drive.google.com/drive/folders/0BzfYa_--3heeNkVpd1NPd090aDA?usp=sharing
(note: Viewing the 10,000 sample size worked just fine for me, however you would need to download the sample sizes 100,000 and 1,000,000 to view them)
The results show the percentages vary quite a bit from sample to sample, however in the general scheme of things the data seems to be somewhat consistent. For example, my data shows the 19 loop is the end of roughly half the trajectories of the numbers in the samples. Only one percentage never changed from sample to sample; unsurprisingly, the 20-10-5 loop consisted of 1/5 of all tested values.
I am unsure if this “loop bias” I observed is a consequence for relying on a sample size to begin with, human/code error, or if there is a mathematical explanation for what makes certain loops more popular than others. I have a few ideas for why some bias occurs, however I am not confident in them, mostly because my ideas heavily rely on speculation I do not know how to prove formally.
EDIT: Here are the loops in order of appearance:
- [1, 8, 4, 2, 1]
- [19, 62, 31, 98, 49, 152, 76, 38, 19]
- [5, 20, 10, 5]
- [23, 74, 37, 116, 58, 29, 92, 46, 23]
- [187, 566, 283, 854, 427, 1286, 643, 1934, 967, 2906, 1453, 4364, 2182, 1091, 3278, 1639, 4922, 2461, 7388, 3694, 1847, 5546, 2773, 8324, 4162, 2081, 6248, 3124, 1562, 781, 2348, 1174, 587, 1766, 883, 2654, 1327, 3986, 1993, 5984, 2992, 1496, 748, 374, 187]
- [347, 1046, 523, 1574, 787, 2366, 1183, 3554, 1777, 5336, 2668, 1334, 667, 2006, 1003, 3014, 1507, 4526, 2263, 6794, 3397, 10196, 5098, 2549, 7652, 3826, 1913, 5744, 2872, 1436, 718, 359, 1082, 541, 1628, 814, 407, 1226, 613, 1844, 922, 461, 1388, 694, 347]
EDIT 2:
I agree that smaller numbers may be responsible for skewing the data. Therefore, I picked the sample size 100,000 to 1,000,000 to test this theory. I uploaded the results to the original Google Doc with the other pie charts.
I was surprised to find, well the same chart. The ratios were slightly different as usual, but aside from that I am unsure to conclude if this test debunks the hypothesis or iterates the small number problem. I could try different sample sizes, however I do not know if that is a good idea or not.
To provide some insight on what I think is going on, I will show you a digital version of some notes I sketched and explain where my speculations came from.
In May, I drew some sketches of trees and made some speculations about what I observed. I assumed if a loop had a branch or a tail coming from the original even numbers in the loop, then the loop would connect to more numbers. I also assumed smaller even multiples (if $n$ is odd, then an even multiple is $n*2^a$, where $a$ is any value) branching to multiples of three "restricted" the size of the loops.
Of course, none of these statements are objective, much less provable. I wanted to share them in case there were any interesting mathematical patterns occurring or if this information shed light on anything...
Here is a digital version of my sketches.
Note: the trees are built using the "reverse Collatz method" or "${(n-1)}/{3}$, or in this case, an adapted version of that method. To divide $n$ by 2, go one number left. To multiply $n$ by 3 and add 5, find the bottom of the "T", which points to the next even number.
Warning: I showed this to a friend and the tree sketch confused them. If you find this sketch confusing, let me know and I will re-draw the entire thing with arrows instead.
Key:
- If an even number branches, It will have a "T" above it. The first odd number on the "T" is the resulting odd number after applying ${(n-5)}/{3}$. The following even numbers are the even multiples of the odd number. (ex. In the 19 loop, 38 will have a "T" over it. 11 is the resulting odd number, and the even numbers after 11 are $11*2^1$, $11*2^2$, $11*2^3$, ...
- Blue numbers are members of a loop.
- Red numbers followed by a "no" sign are multiples of 3.
- Purple "T"s connect the loop.
- Green "T"s emphasize the extra "tail" or branch.
- Orange "T"s emphasize where a tail could have been, but the number branched to a multiple of 3 instead.
- Arrows connect the separated ends of the loop.
- "..." are used to convey numbers not shown.
I color-coded the sketch to draw attention to certain properties. I figured it would make it easier to understand.
Part 1 - Table of some properties of the $6$ known cycles.
update - a somewhat longer exposition and a longer table at my homepage
For a certain length $N$ more than one vector can be possible - not only by rotation (which gives cyclically the members of one cycle) but also besides of the pure rotations by other combinations having the same $(N,S)$ which gives then true different cycles.
Table 3x+5:
Note that a very similar structure occurs for $3x+7$ and $3x+13$ and etc. problem. For instance for the $3x+13$ we get the following table
Table 3x+13:
Part 2 - approching the problem of different relative frequencies
I'm looking at the backwards-tree starting at the cycle-element $19$ vs. that starting at the cycle-element $29$ Here the key for the greater frequency of the occurence of the $19$-cycle seems to be that the backwards-transformations cover the smaller (odd) numbers compared with that of the $29$-cycle - which means, that the smaller numbers transform to the $19$-cycle compared with the $29$-cycle by the $(3x+5)/2^A$-map. I cannot really formalize this for the relative frequencies below some fixed upper bound $N$ at the moment, but it might give some good intuition...
The representation in a line is as follows:
vector $A$ is here the (infinite) vector of all numbers $a_k$ going down to $a_\text{parent}$ by one transformation: $ a_\text{parent}=(3a_k+5)/2^B$ . Of course each element of $A$ (except that which are divisible by $3$) are parents of another vector $AA$. The first couple of this entries are documented then i the following line, indented by some more spaces.
I printed that recursive tree, which has also a cyclic subtree of recursion-cycle of $3$, to a depth to $5$ . To focus the aspect of containing many small numbers, I omitted parents larger than $1000$ and also their subtrees although this might not be perfectly correct, because they can have parents themselves which asre smaller than $1000$ - but I left this aside.
The tree based on cycle-element $19$ has many more small numbers than the tree bases on cycle-element $29$.
An even more intuitive tree is the same tree but values taken to $\log_2()$. The progression in the vectors in then nearly linear, and the values of the $19$-tree seem a bit more dense than that of the $29$-tree if we select a window of values with a fixed lower and upper bound. But I don't want to suggest that this impression is already objective and could already answer your question!