Kolakoski sequence

190 Views Asked by At

See wikipedia and here for the definition of Kolakoski sequence. I recently notice a slide made by Prof. Richard P. Brent, which states that we can calculate the frequency of a letter within the first $n$ terms under sublinear complexity. But I encounter some difficulties during implementation. May I ask how to generate the lookup table in $O(2^{d_{max}})$? Can the conclusion be extended to other Kolakoski-type sequences? Thanks.

1

There are 1 best solutions below

2
On

To generate the lookup tables in $O(2^{d_\max})$, or, equivalently, to generate the $d$th table in $O(2^d)$, you need first to generate the tables for the lesser $d$ and then use them when generating the $d$th table just in the same way as you then use all the prepared tables when generating a term of Kolakoski sequence.

I have implemented a demo which shows both lazy $O(3^d)$ and fast $O(2^d)$ generation methods. Disclaimers: it's written in Python, so it's too slow for practical use with large $n$; also, there are some inaccuracies that seem to make it rather $O(d2^d)$ than $O(2^d)$.

As for other Kolakoski-type sequences, assuming you mean sequences like the ones referenced in the paragraph entitled "Kolakoski-type sequences using other seeds than (1,2)" in OEIS A2, yes, the same algorithm can work for them, but you'll lose the possibility to make use of bit operations. You still can encode rows as sequences of bits, but you'll have to implement operations with them using cycles instead of atomic binary operations, which will slightly worsen the complexity of the algorithm.

In general, when analyzing a row of the table shown in Brent's slides, you need to keep track how much times will a value in every column occur again before it changes. For the usual (1,2) Kolakoski sequence, this information can naturally be stored as a sequence of bits, partially obtained from the sequence of bits encoding the row itself by a bit shift by 1 (because of the definition of the Kolakoski sequence: every number is also a run length). For generic Kolakoski-type sequence, this bit tricks aren't available any more, so it may be convenient to store these run lengths as an array of ints/bytes/u8s; you don't have to worry about the space it takes. But the information about the row itself is used in the lookup tables (both as keys and as parts of values), so you have to make it compact. If your sequence has only two different alternating values, like in OEIS A064353 or A071820 (or in the usual Kolakoski sequence), then you can still store the current value in a single bit, but with different correspondence than $0\to1,1\to2$; if there are three or four values changing by a cycle, like in OEIS A079729 or A079730, then you'll have to use two bits to store the current value (so for the (1,2,3) variant, say, the row $1, 1, 3, 3, 2$ can be encoded as 0b01_10_10_00_00); for five to eight values, you'll need three bits, and so on. You'll have to implement functions which (a) check the value of a given (0th, 1st, 2nd...) group of bits (= term in the row of the table) and interpret it as one of the numbers the sequence consists of, and (b) set the value of a given group of bits.