prove that every lossless compression algorithm must result in increasing the file size for some inputs.?

2.3k Views Asked by At

Using Pigeonhole Principle prove that every lossless compression algorithm must result in increasing the file size for some inputs.?

2

There are 2 best solutions below

0
On

Suppose you have an input file with $N$ bits.

You'd have $2^N$ possible inputs.

Suppose you could compress this for all inputs to $K$ bits, with $K < N$, these can only form $2^K$ compressions.

This would be the equivalent of pigeon hole principle with $2^N$ items and $2^K$ containers. Since $2^N > 2^K$, we would have at least two items in one container.

But then decompressing at least one item would yield two results, making the the compression either irreversible or not lossless.

0
On

Suppose not. That is, suppose that we had a lossless compression algorithm A that makes some files smaller and does not make any files larger.

Let x be the shortest file whose compressed size is smaller than its original size. (If there are two such files of the same length, pick either at random.) Suppose that the input size of x is m characters.

Suppose that S is the set of distinct files with fewer than m characters. Because x shrinks, A compresses x to a file in S. Because no files smaller than x shrink, each file in S compresses to a file (perhaps the same, perhaps different) in S.

Now we have a problem. A is supposed to be lossless, therefore one-to-one. But A maps a set containing at least |S| + 1 files to a set containing |S| files, so the Pigeonhole Principle states that two input files must be mapped to the same output file. This is a contradiction.