What's theoretical maximum information compression rate?

212 Views Asked by At

Let's say I've got a random bit sequence s and a reversible function f(s), for which the following statement f'(f(s)) = s is true. What is the theoretical maximum average compression rate of such function?

IIRC, most if not all compression algorithms of today tend to identify particular patterns and map them with standard where it's possible. This makes the maximum and minimum compression rate 1/[s] and 1 correspondingly.

NB: Higher rate — weaker compression (worse)

2

There are 2 best solutions below

0
On BEST ANSWER

The optimum average length of the compressed binary stream $f(s^{[n]})$, (let's call it $L^{[n]}$) is given by the first Shannon theorem, in the range:

$$ H(s^{[n]})\le L^{[n]}<H(s^{[n]})+1$$ so the compression rate $r=L^{[n]}/n$ ($n=[s]$ in your notation) is $$ \frac{H(s^{[n]})}{n}\le r<\frac{H(s^{[n]})}{n}+\frac{1}{n}$$

Here $H(s^{[n]})$ is the entropy (in bits) of the full message of length $n$. In the particular case of independent stationary symbols, $H(s^{[n]})=n\, H(s)$ where $H(s)$ is the entropy of a single bit, and then

$$ H(s)\le r< H(s) +\frac{1}{n}$$

0
On

The question is answered by Shannon's source coding theorem.

For i.i.d. input, the theorem establishes that the minimum compressed bit rate is given by the entropy of the source.

For statistically dependent input bits the same result applies, but entropy has to be defined in a more general manner to take dependence into account. See for example John G. Proakis, "Digital Communications", 2nd edition, section 2.3.