Is there a machine learning task (or any task/problem) that one can by construction know the Kolmogorov Complexity (or minimum description length)? I know the Kolmogorov Complexity is uncomputable but I was curious if there existed a set of tasks that the Kolmogorov Complexity can be known apriori and one can vary the Kolmogorov Complexity of those related tasks. I know Kolmogorov Complexity is uncomputable but I thought perhaps if it was by construction known then the task described -- perhaps it was doable.
Some ideas I thought of:
- Random bitstrings: As mentioned before, the Kolmogorov complexity of a fair coin flip bitstring of length n is exactly n. No shorter description is possible.
- Lookup tables: If you construct a lookup table that maps n-bit input binary strings to arbitrary outputs, the table requires 2^n entries, each of b bits. So the total complexity is 2^n * b.
- Parity functions: As you mentioned, an n-bit parity function requires describing each of the n input bits, so the complexity is n. More formally, you need n + c bits where c is the size of the program that computes XOR. For large n, c is negligible.
- Decision tree models: If you construct a full binary decision tree of depth d, it has 2^d leaves. To specify it requires 2^d output class labels + internal node conditions. Each condition can be specified in c bits. So total complexity is roughly c2^d.
Ideally I want to implement it in a computer to create this family of known Kolmogorov Complexity problems.
fyi: uncomputable means that there is no algorithm to compute it in general and I know KC is uncomputable in the general case.
Not really.
To establish the Kolmogorov complexity of a string, we'd have to know the halting status and output of all Turing machines with a shorter representation. Thus we can't have exact Kolmogorov complexities for anything with length longer than that for which we have a value for the busy beaver function. That's not a lot.
As m-stgt found, the complexity is "recursively enumerable from above," which is to say there is a recursively enumerable decreasing sequence of functions that approaches the complexity function $K(s)$. The method, due to Kolmogorov, is however not enlightening and terminates on no inputs: Let the sequence $K_t(s)$ be indexed by natural $t$, and let $K_0(s)$ be a naïve upper bound on the complexity of $s$. Then, to find $K_{t+1}(s)$, run all Turing machines with descriptions shorter than $K_t$ for $t$ steps, and take $K_{t+1}(s)$ to be the minimum of $K_t(s)$ and the description lengths of any Turing machine that outputs $s$. This is obviously decreasing in $t$, and as $t\rightarrow\infty$ you have $K_t(s)\rightarrow K(s)$. This computes the complexity in the same sense that running all Turing machines of a certain length increasingly longer computes the busy beaver function: eventually you'll have the answer, but the tricky bit is knowing when you have it.
All of the examples you give are formulations of asking what the maximum complexity of a string of a given length is, and this will always be at least the length: you can't possibly describe all strings with descriptions of shorter lengths, as there are more strings of a given length than descriptions shorter. Meanwhile, there is a trivial upper bound on this given by the length of the program that hardcodes an output of a given length, which will be some negligible constant overhead on the string length.