Difference between Shannon Entropy limit and Kolmogorov Complexity?

537 Views Asked by At

I've read in numerous places that Shannon Entropy provides some kind of fundamental limit to the compressibility of messages (according to, for example, Shannon's source coding theorem). I have also read about Kolmogorov complexity, which establishes a different kind of limit to compression by considering the shortest program that could output a given piece of text.

My question is: How are these related? They can't both be limit of compression.

There have been previous questions on MSE which superficially resemble what I'm asking but have never been given an appropriate answer. I'm looking for:

  • A clear explanation of the different notions (if they exist) of compression potentially being discussed.

  • A clear explanation of what the different compression-limits are and why they differ. This should also mention something about why K. complexity is uncomputable whereas Shannon entropy is computable (I understand the proof of the uncomputability of K-complexity, I'm just saying it should be clear why there's a difference).

1

There are 1 best solutions below

5
On BEST ANSWER

Shannon Entropy is about compression of (long) words generated by biased die toss. And if dice entropy is $H$, we can't (on average) compress results of $n$ tosses to less than $n\cdot H$ bits.

Kolmogorov Complexity is about compression of arbitrary words. Of course, if word is generated by the same die, it's average complexity will be $n\cdot H + O(1)$.

However, if your data is not generated by biased die, then you may be able to achieve better compression than just recoding individual chars. For example, if you want to send $n$ first digits of $\pi$, it's quite easy to compress them to $\log(n) + O(1)$ bits: if we know that we want to send first digits of $\pi$, it's enough to send number of digits we want to send (which will require $\log(n)$ bits), and then we will be able to calculate necessary digits by ourselves.

About computability - I think good intuition will be that to compress something, we need to find some "patterns" (ways to distinguish our sequence from arbitrary random in the same alphabet) in object we want to compress. If we use, for example, arithmetic coding - we only use very simple pattern of noticing that different symbols have different probabilities. If we want to compress word for Kolmogorov's decompessor, we are interested in any computable "patterns", which is much larger and more complex set. And discovering arbitrary computable "pattern" turns out to be uncomputable.