Is it possible to create a string with known Kolmogorov Complexity?

1k Views Asked by At

I wish to compare compressors using strings with known Kolmogorov Complexity, but I haven't got the theoretical background and tools to understand how to do that. I'm just starting in this area and any pointer to papers/textbooks is highly appreciated.

I imagine that to "tune" the KC of a string with length $n$, one has to use some random source, otherwise the same program used to generate several strings could be used as the measure of complexity for those strings. I idealized a state machine with two states (A and B, starting state = A) and a transition probability $p$ that would produce the following strings with different complexities for different $p$:

p = 0.0:    AAAAAAAAAAAAAAAAAAAAAAAAAA... 
KC ~ 1:     = A * n

p = 0.1:    AAAAAABBBBBAAAAAAAAAAAABBB... 
KC ~ n/10?: = (A * 6) ++ (B * 5) ++ (A * 12) ++ ...

p = 0.5:    AABBABABBBBBBAAABBABBBABBBA...
KC = n?

p = 0.9:    ABABAABABABABABBABABABABAA... 
KC ~ n/10?: = (AB * 2) ++ A ++ (AB * 5) ++ B ++ (AB * 4) ++ ...

p = 1.0:    ABABABABABABAABABABABABABA...
KC ~ 1:     = AB * (n/2)

However, this look like more some measure of entropy than proper KC. Are they equivalent for randomized processes?

3

There are 3 best solutions below

9
On BEST ANSWER

This is an attempt to answer the titular question "Is it possible to create a string with known Kolmogorov Complexity?" (as elaborated in the comments: "fixing the length n of the string, is there a process to produce a string with a desired Kolmogorov complexity (between 1 and n)?").

My experience with Kolmogorov complexity is rather limited, but I think can say something helpful here.

The Kolmogorov complexity of a string is defined relative to a fixed universal (prefix free) machine. Since for any string $\sigma$ there is a universal machine $U$ such that $K_{U}(\sigma)=1$, the question only makes sense if we assume a fixed background universal machine $U$.

With $U$ fixed, the answer is no in general. First, the answer is no for the trivial reason that, given $n$, there's no guarantee that $K_{U}(\sigma)=n$ for any $\sigma$ at all.

Second, even if $U$ has the property that for each $n$ there is a $\sigma$ such that $K_{U}(\sigma)=n$, the answer is still no because if you've seen that $U(\tau)=\sigma$, so that $K_{U}(\sigma)\leq |\tau|$, if there's even one string $\rho$ with length less than $\tau$ such that $U$ is still undefined on $\rho$, then you'll just have to wait to determine if $U(\rho)=\sigma$. The point here is that the Kolmogorov complexity function isn't computable, but rather only computably approximable from above.

All that being said, I'd guess there is a universal machine $U$ with a computable set of strings $\langle \sigma_{n} \rangle_{n\in \mathbb{N}}$ such that $K_{U}(\sigma_{n}) = n$. I'm not sure of this though, and hopefully someone else will chime in and confirm/refute this guess (as well as the other things I've said).

0
On

The short answer is "no".

Random sources of known entropy, if you can find such a thing, give strings with KC probably close to $n$, more exactly a probability distribution of KC sharply concentrated near $n$, but exact determination of which strings have the precise value of KC you are looking for will not be possible.

You cannot create at will a string with Kolmogorov complexity known to equal $n$, because KC refers to minimum-length description by arbitrarily powerful computer programs, and judging that a string has $KC$ at least $k$ very quickly (that is, for small $k$) exceeds the power of any reasoning system (such as axiomatized mathematics, human brains, or networks of human and computer brains that collaborate using heuristics and axiomatized mathematics). Upper bounds on $KC$ are "only" a matter of having unlimited memory and unlimited time, just find the right program and run it. It is the lower bounds that are the killer.

For very limited forms of compression, like RLL, and short enough texts, it is possible to search through the possible compressed descriptions and produce optimally compressed texts. I don't know if this is feasible for something even slightly complex, like Lempel-Ziv coding; intuitively, if there is an efficient method for testing if a string is optimally compressed, then the compression method is trivial.

What is done in practice (for benchmarking) is to fix a compressor or a set of compressors, and use the compressed length of a text as the measure of complexity.

1
On

Is it possible to create a string with known Kolmogorov Complexity?

There cannot be a universal computable prefix-free function $U$ with that property that there is an r.e. sequence $\sigma_n$ of strings with $K_U(\sigma_n) = n$ for all $n$. So, in that sense, we cannot produce the desired string effectively.

Proof: Consider a computable prefix-free function $f$ such that $f(1^n0) = \sigma_{2^n}$ for each $n$. Then by universality of $U$ there is a fixed $w$ such that, for all $n$, $U(w1^n0) = \sigma_{2^n}$. Choosing $N$ large enough, $|w1^N0| = |w| + N + 1 < 2^N$, so $K_U(\sigma_{2^N}) < 2^N$, contradiction.

This generalizes to show it is impossible to have an r.e. sequence of pairs $(\sigma_i, n_i)$ with $K_U(\sigma_i) = n_i$ such that the set of $n_i$ that appears in the sequence is unbounded.

It is possible for an r.e. sequence of strings to contain strings of arbitrarily high complexity - the sequence that just enumerates all the strings must have that property - but it is not possible to correctly guess (in an r.e. way) the exactly complexities of a sequence of strings with arbitrarily high complexities.

This idea is at the root of the Chaitin incompleteness theorem. The additional idea there is that if we have an effective theory of arithmetic, we can enumerate all the proofs of statements of the form "$K_U(\sigma) = n$" to form an r.e. sequence of pairs $(\sigma, n)$. Thus: either the system proves some statements of that form that are false, or there is an $M$ such that $n<M$ in every statement of that form that is provable in the system. This applies to any fixed effective axiomatic system that includes statements that can be interpreted as "$K_U(\sigma) = n$".