How to assess the complexity of a string

319 Views Asked by At

Q How do I quantify complexity of a given string? Consider a sample:

AAAAAAAAA     //very easy - meta sample [COMPLEXITY = 0]
KEui£$n9&E    //say random - meta sample [COMPLEXITY = ∞]

To my limited knowledge Kolmogorov Complexity deals with this problem, but it is non-computable. So what measures I can use to approximate the complexity of the string? Could you point me to the right direction or literature for the following problem.

Many thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

What people do in practice is compress the strings and use the result as a complexity measure. It only works if the strings are long enough.

It is known that the Lempel-Ziv algorithm used by many compression problems is optimal (up to constants and implementations) for ergodic Markov chains (in their stationary distribution). So if you think that your string is generated by such a process, you can approximate its entropy this way.

This is not the same as lewellen's suggestion. Suppose for example that your string is $0101010101$. This string is easy to compress, but the entropy of its empirical distribution is large. The reason for this discrepancy is that the characters in the string are not independent. In fact, each character is determined by the previous one. That just fits the Markov chain model.

1
On

Shannon Entropy has a well defined calculation that can be computed that will approximate Kolmogorov Complexity.

I suggest you read up on the two in depth here: http://homepages.cwi.nl/~paulv/papers/info.pdf