Universal codes are prefix codes that map the positive integers to binary codewords, and have monotonically increasing codeword length.
Elias omega coding is a universal code based on recursively encoding the lengths. All the variants go something like this:
Write a $0$.
If $N=1$, go to step 7.
Write $N$ in binary, but drop the highest $1$.
Write a $1$.
Replace $N$ with the length of that binary representation from step 3.
Go to step 2.
Reverse everything we just wrote.
It effectively creates blocks, where each is either followed by a $0$ to signal the end or $1$ to signal it encodes the length of the next block.
The length of the encoded form is:
$$L(N)=1+\sum_{k=1}^{\log_2^*(N)}{\lfloor\log_2^k(N)\rfloor+1}=1+\log_2^*(N)+\sum_{k=1}^{\log_2^*(N)}{\lfloor\log_2^k(N)\rfloor}$$
If we write it out in full, we see it's a sequence of functions of $N$ each slower growing than the previous, ending in $1$:
$$L(N)=\lfloor\log(N)\rfloor+\lfloor\log^2(N)\rfloor+\lfloor\log^3(N)\rfloor+\cdots+\lfloor\log^{\log_2^*(N)-1}(N)\rfloor+\lfloor\log^{\log_2^*(N)}(N)\rfloor+\log_2^*(N)+1$$
I independently came up with an encoding with the same idea as Elias omega. I had some variations, though. The bits in the blocks were written in big endian order, and the signal bits were all moved to the front.
As a result, the positive integers now encode like so:
1 0
2 100
3 101
4 110000
5 110001
6 110010
7 110011
8 1101000
9 1101001
10 1101010
11 1101011
12 1101100
13 1101101
14 1101110
15 1101111
16 11100000000
17 11100000001
18 11100000010
19 11100000011
20 11100000100
21 11100000101
22 11100000110
23 11100000111
24 11100001000
25 11100001001
26 11100001010
27 11100001011
28 11100001100
29 11100001101
30 11100001110
31 11100001111
32 111000100000
33 111000100001
34 111000100010
35 111000100011
36 111000100100
37 111000100101
38 111000100110
39 111000100111
40 111000101000
Interestingly, these are lexicographically ordered: $a<b \iff E(a)<E(b)$ That happens because the terms in the slow growing sequence show up in order from slowest to fastest, and all are monotonically increasing. But that isn't my concern right now.
The first chunk, the collection of signal bits, represents $\log_2^*(N)$, and is clearly unary encoded. Unary coding is horribly inefficient. So this suggests that I should encode $\log_2^*(N)$ using something better. What this would look like is the $\log_2^*(N)$ in the slow growing sequence is replaced by a series of slower functions.
By the looks of it, there's no limit to how deep we can go with the recursive method. There seems to be always another measure to possibly encode - first the number of logarithms, then the number of iterated logarithm encodings, then the number of layers of these re-encodings, and so on. It sounds enticing to give up at some point and just use a non-recursive encoding for some measure, but then we wouldn't have the most efficient universal code possible.
So now I ask how we can construct some universal code based on this which only recurses using itself, and never relies on other codes like unary coding for any part of it, even indirectly like Elias omega coding.
If we can do this, then we would have a slowest growing universal code. We can narrow down the candidates like so:
- It must be asymptotically optimal. Let $A$ be this slowest growing universal code, and $B$ be any other universal code, then $L_A(N)-L_B(N)\le c$ for some finite constant $c$.
- It must have the least constant term in its length function. The sequence of slower growing functions must be the same for all universal codes in the family of slowest growing universal codes defined so far, so only the constant can differ.
- We can additionally impose a requirement for lexicographic ordering of codewords, leaving only one possible slowest growing universal code.
It seems like such a slowest growing universal code is well defined and does exist, I just haven't figured out how to make it. Unlike in some other contexts where you can always find something slower, I'm quite sure there is a lower bound for the length of a codeword, and if such a slowest growing universal code exists, it achieves that lower bound. It must only recurse using itself, since by definition, any other universal code grows faster, and would leave room for improvement, making it not the slowest growing.
If such a slowest growing universal code cannot exist, why?
I might have been a bit vague before on what kind of recursion problems there are, and why it isn't straightforward to recurse as much as needed. I'll illustrate this with fast growing functions instead, which have a similar issue with recursion.
We might start by defining a function that grows at all:
$$f(n)=n+1$$
We can then re-parametrize this to create a new function:
$$g(n)=f^n(3)$$
But we won't stop there. Let's recurse more:
$$h_0(n)=n+1 \\ h_m(n)=h_{m-1}^n(3) \\ h(n)=h_n(3)$$
So at this point we've re-parameterized it twice. How about we make a new function, which is the $n$th re-parameterization, applied to $3$?
There's no limit to the recursing and further re-parameterization, and hence, no good way to quantify this kind of fast growing hierarchy.
But even though we face similar problems with unbounded recursion, I think the slowest growing universal code can exist, since unlike the fast growing hierarchy which converge to a target with no consistent definition (fastest growing function is contradictory), the slower growing universal codes converge to the slowest growing universal code which does have a consistent definition.