Can random number generators be used in compression?

3.7k Views Asked by At

This is basically a recurring thought I have from time to time and I suspect it is flawed, so my question is basically why can't you do the following:

Basically, I imagine that given a seed and a random number generator, the binary result at a given length might have something in common with some target data that you want to compress. In the case that it just happened to be the same, you'd get a phenomenal amount of compression.

That said, I suspect this is in the same league as the perpetual motion machine I thought I invented when I was a kid (air lock at the bottom of a tube of water, ball floats to top and falls back down to air lock...)

2

There are 2 best solutions below

0
On BEST ANSWER

"In the case it just happened to be the same" is the key. It will (almost) never happen. Say you have million bit strings. There are $2^{1000000}\approx 10^{301030}$ of them. If the string happened to be $\pi$, you would have a phenomenal amount of compression. That happens very rarely. If your random number generator is good, it will happen just as rarely. The compression we use reflects the non-randomness in the files we compress. There are tremendous correlations in text (because it is generated in natural language, not random) or photos (big areas are close in color). Compression takes advantage of that.

0
On

The randomness isn't essential in your argument; what you're really talking about is the idea of a function with small inputs having large outputs. So one idea is instead of doing random numbers, take a huge library of videos or images or code, etc. and analyze them to find frequently recurring patterns, and then make a list of those patterns. The location in that list is similar to the seed for a random number generator.

This has been done before; I went to a talk recently where they said the most common pictures in photographs were sharp lines, so they encoded ever instance of a line-like object (viewed as a subset of a five by five grid) in a list, and then decomposed photographs using these masks. It was really interesting and fun! I think Morse theory had been used similarly, but I'm not sure.