Encoder based on large similar data

22 Views Asked by At

Let us say you (Alice) and another agent (Bob) share a large piece of data (say, the Gutenberg project collection of books, or the Linux kernel. You want to send a smaller but still large piece of data (such as a novel or program). They both have fast processors, but communication is expensive, so they want to compress the message to be as small as possible. Is there a way to take advantage of the large piece of data to compress the little data.

What algorithms exist that can take a large piece of data, and given a message that is similar to it (I would rigorously define this, but I will let what this means up to the answer-er) and compress it such that any one with the large piece of data could decompress the message to get the original. Ideally, $\log_2(|M_E|) \approx H(M|D)$, where $M$ is the original message, $D$ is the data, and $M_E$ is the encrypted message.