Is there a prefix-free language that can encode any other prefix-free language with at most a constant overheard?

88 Views Asked by At

Let $U$ and $P$ be prefix-free languages with alphabet $\{0,1\}$. We say that $U$ can encode $P$ with at most a constant overhead if there exists an injective function $c:P \to U$ and a constant $a$ such that

$$\forall s \in P. |c(s)| \le a + |s|$$

For example, any prefix-free language can encode itself with an overhead of $0$.

Consider the prefix-free languages

$$U = (00|01)^*1$$ $$P = 0^*1$$

Then let the function $c$ which takes the string $0^n1$, converts $n$ to binary, puts a $0$ before each bit, and puts a $1$ at the end. Then $U$ encodes $P$ using $c$ with an overhead of $2$.

Additionally, given a countable set of prefix languages $S$, we can define

$$U = 0^n1S_n$$

where $S_n$ is the nth prefix language in $S$. Then $U$ can encode $S_n$ with at most an overhead of $n+1$. So $U$ can encode any prefix-free language in $S_n$ with at most constant overhead.

Is there a language $U$ that can encode every $P$ with at most a constant overhead? The constant and encoding function can be different for each $P$, but must exist for each $P$.

Note that the different encoding functions are permitted to overlap. For example, if the encoding functions for $P$ and $P'$ are $c$ and $c'$, then

$$\exists s \in P, s' \in P'. c(s) = c'(s')$$

is permitted. Indeed, it is necessary for many $P$ and $P'$ since there are uncountably many non-empty prefix-free languages but only countably many strings in $U$.

$$\exists s, s' \in P. s \neq s' \land c(s) = c(s')$$

is not permitted.

Additionally, $U$ does not need to be computable, but neither does $P$.

1

There are 1 best solutions below

0
On

No

Let us say that $U$ is a prefix-free language that can encode any other prefix-free language with at most constant overhead.

Let

$$L_n = \{s \in U: |s| = n\}$$ $$L_{\le n} = \{s \in U: |s| \le n\}$$

for any language $L$.

By Kraft's inequality $$\sum_{i=0}^\infty \frac {|U_i|} {2^i} \le 1$$

This implies for any $\epsilon > 0$, there exists a $n$ such that for every $N > n$ we have $\frac {|U_N|} {2^N} < \frac \epsilon 2$. That implies

$$|U_{\le N}| = |U_{\le n}| + \sum_{i=n+1}^N \frac {|U_i|} {2^i} 2^i < |U_{\le n}| + \sum_{i=n+1}^N \frac \epsilon 2 2^i $$$$ = |U_{\le n}| + \frac \epsilon 2(2^{N+1} - 2^{n+1}) < |U_{\le n}| + \epsilon 2^N$$

which implies $$\forall \epsilon > 0. \lim_{n \to \infty} \frac {|U_{\le n}|} {2^n} \le \epsilon$$ $$\lim_{n \to \infty} \frac {|U_{\le n}|} {2^n} = 0$$

Now, let $f$ be a function such that

$$\frac {|U_{\le 2n + f(n)}|} {2^{2n + f(n)}} \lt 2^{-2n}$$

For any $n$, a $f(n)$ will exist with this property.

Now let $P$ be the prefix-free language

$$P = \{0^{n-1}1t : n > 0, t \in (0|1)^{f(n)}\}$$

There will exist some injective function $c:U \to P$ and constant $a$ such that

$$\forall s \in P. |c(s)| \le a + |s|$$

Also note that $|P_{\le n + f(n)}| \ge 2^{f(n)}$, since there are $2^{f(n)}$ strings with length $f(n)$.

Now

$$|U_{\le 2a + f(a)}| < 2^{2a + f(a) - 2a} = 2^{f(a)} \le |P_{\le a + f(a)}|$$

On the other hand, since $c : P \to U$ is injective

$$\forall n. |P_{\le n}| \le |U_{\le a + n}|$$

A contradiction!

A similar argument can be used to show that there is no computable prefix-free language that encodes any other computable prefix-free language with at most constant overhead. That is because when $U$ is computable, $f$ is computable via a search.