Why is 4 rare in the Van Eck sequence?

361 Views Asked by At

The Van Eck sequence is defined here: https://oeis.org/A181391

By far the most frequent integer at the beginning of the sequence (up to first 10 million terms) is zero.

Intuitively, one would expect smaller integers to be more frequent than larger integers, and inspection of finite initial slices of the sequence shows a fairly smooth decline in frequency of integers as they get larger.

However, there appear to be a number of pronounced local minima and maxima in the frequency distribution before this pattern of smooth-ish decline begins. For example, in the first 5000 terms, there are 1703 zeros, 345 1s, 105 2s, and 72 3s. There is then a rise back up to 693 5s, then decline.

In the first million terms, there is a minimum at 3:

0 appears 136529 times.
1 appears 24000 times.
2 appears 7643 times.
3 appears 748 times.
4 appears 1644 times.
5 appears 17685 times.
6 appears 48609 times.
7 appears 43041 times.
8 appears 31182 times.
9 appears 19384 times.
10 appears 9447 times.

In the first 10 million terms, the minimum has shifted slightly to 4, but is even more pronounced:

0 appears 1250523 times.
1 appears 215715 times.
2 appears 69497 times.
3 appears 6102 times.
4 appears 2410 times.
5 appears 39936 times.
6 appears 260931 times.
7 appears 453416 times.
8 appears 359203 times.
9 appears 249831 times.
10 appears 142729 times.

(there appear to be several further maxima and minima)

Why does this happen? Is 4 rare in the (full) infinite sequence also?

A 0 in the sequence means that the previous term has not occurred previously.

...a0... => a has not previously occurred.

A 1 in the sequence means that the two prior terms are identical.

...xa..(a terms)..xaa1...

A 2 in the sequence...

...ypx..(a terms)..yaxa2...

A 3 in the sequence...

...xpqz..(a terms)..xayza3...

A 4 in the sequence...

...wpqrz..(a terms)..waxyza4...

And so on... Why should 4 be especially rare???

2

There are 2 best solutions below

0
On

My best guess is that it wasn't fairly common as the length between zeroes. Consider the way the sequence is structured. It is a series of subsequences delimited by zeroes; where the first number is the length of the previous sequence, the last is a number that has not appeared yet, and the middle is a random series of numbers. After it's first instance, a number can reappear as the first number or somewhere in the middle. The average length of the sequence seems to increase from 3 early on and up, Some numbers appear often because they're close to the sequence length average. Some shorter numbers can appear as well when certain numbers appear multiple times in one subsequent. I'm guessing that 4 didn't appear very often as a sequence length and it also was too long to appear in the middle of sequences very often.

0
On

For farther entries in the Van Eck sequence, it seems to take on a sort of pattern. For instance, here is a section of the sequence at around 83 million terms in (just where I happen to be currently generating)

0, 8, 15, 1044, 7281, 334155, 9865070,
0, 7, 32, 195, 13564, 30578, 634956, 6969393,
0, 8, 15, 15, 1, 166, 3047, 34316, 233674, 8343746,
0, 10, 57, 1519, 17361, 97154, 2528521, 4052205, 26943530,
0, 9, 42, 262, 31321, 2301035,
0, 6, 254, 6929, 146161, 981873, 10304433,
0, 7, 40, 177, 1401, 74602, 1089258, 22973736,
0, 8, 40, 8, 2, 72, 72, 1, 43, 525, 13913, 85720, 234113, 175154, 4895438, 1110545, 29717902,
0, 17, 160, 2903, 16539, 401432, 15918196,
0, 7, 32, 72, 21, 1974, 39575, 523888, 6836023,
0, 9, 54, 157, 958, 4532, 26834, 555328, 37721941,
0, 9, 9, 1, 38, 1582, 28467, 585200, 19932680,
0, 9, 8, 50, 1035, 74259, 8489674,
0, 7, 34, 121, 1863, 42022, 518001, 15822588, 1217662, 1597874, 10464774, 28573294,
0, 12, 263, 12555, 460479, 4588880, 35836304,
0, 7, 19, 164, 639, 32553, 328746, 166166, 12338469,
0, 9, 35, 905, 8229, 4816, 43819, 14893, 728247, 20206470,

With a few exceptions that I've highlighted, the pattern near the nth term of the sequence seem to roughly be "start at zero, grow for a few terms by roughly powers of 10, hit a number you haven't seen at around log(n) terms in, repeat at 0." Intuitively, this makes sense provided that the chance of any particular number appearing in the sequence at any particular location scales with 1/n. Which based on numerical data, it seems to, in my sequence file currently I have 40727 "100"s, 3931 "1000s", 354 "10000s", and 35 "100000s" and exactly one "1000000," here it is:

0, 11, 102, 901, 74, 322, 928, 5572, 23490, 81424, 1000000,

Because the Van Eck sequence can be quite chaotic this sequence pattern doesn't always hold, see the bold counterexamples, but as you can see, the ones that tend to make small numbers tend to be located towards the beginning of each growth pattern, the large ones tend to send the growth to within an order of magnitude below itself. Also note that once the average size of these growing patterns goes beyond 4 the method of getting a 4 is most typically within one of these small exceptions. Let's call the first kind of 4 the "growth 4s" and the second kind "random 4s."

However, while random 1s are common (due to patterns like the highlighted 0, 9, 9, 1) look at the pattern containing a 2, it requires that you have two subsequent patterns that start with the same numbers, in this case the 2 is "caused" by the 7 on the second line and the 8 on the third line combining with the 7 on the seventh line and the 8 on the eighth line. So you can see how a random 2 would be a little rarer, needing a repetition of a second entry instead of a first. It would stand to reason that a repeating third entry would produce a random 3 from time to time, and a repeating fourth entry would produce a random 4.

Here is an example a bit later in the sequence with a random 3:

0, 11, 38, 297, 3279, 83832, 2061019,
0, 7, 54, 297, 7, 3, 1473, 19986, 384060, 6704968,

And a random 4:

0, 9, 9, 1, 36, 853, 22149, 833322, 16578515,
0, 9, 8, 28, 36, 9, 4, 53027, 1600487, 72828161,

There are other ways to make a random 3 and a random 4, for instance it can be a result of some weird stuff happening in the beginning of a line:

0, 7, 7, 1, 26, 26, 1, 3, 5783, 43352, 2590513, 32873149,

and:

0, 10, 26, 8, 11, 202, 6186, 120020, 2450649, 57838165,
0, 10, 10, 1, 20, 202, 10, 4, 4866, 38100, 176556, 3833707,

where it's not the 4th but the 5th repeat in a line, but it had to be because the first was also a repeat.

I suspect you'll see total 5s taper off as well as growth sequences of length 5 become more and more rare, and eventually perhaps 4s will outgrow 5s just through the slow trickle of random 4s which outpaces random 5s.

That being said, there are even still growth 4s from time to time, the three most recent in my file being at around 71.8 million terms in, 72.4 million terms in, and 76.3 million terms in:

71.8mil: 0, 5, 37713, 1622080, 0, 4, 1175, 109114, 2295439, 25478069,

72.4mil: 0, 5, 29939, 2569293, 0, 4, 2825, 2453, 46609, 7130, 417964, 21098542,

76.3mil: 0, 5, 27519, 2242716, 0, 4, 10990, 139103, 6023678,

In this area, having a value of about 20k-30k as the second entry in the sequence is rare, and then having an unvisited value around 2 million is also rare, as you can see, this requires some luck to happen, but it still happens. On the other hand, the last growth 3 seen was at around term 2.6 million. 0, 4, 79083, 0, 3, 107, 4032, 333484, and the last growth 2 seen was the 14th term of the sequence. It may be possible that for a given n there is a final "growth n" that appears in the sequence. Don't know.

None of this is rigorous or anything, of course. But I hope you get from this that 4 probably isn't really special. If you go far enough it will probably be the case 5 will be the new lowest number, then 6, then 7, and so on, except the change is logarithmic, so it will take quite a long while.