Lossy entropy coding

99 Views Asked by At

Using arithmetic coding we can encode a sequence of independent biased coin flips using $<1$ bit per coin flip. For example consider a coin with $p=0.8$. We can encode the flips of this coin using $H_b(p)=-p\log_2\left(p\right)-(1-p)\log_2\left(1-p\right)\approx 0.7$ bits, which is the theoretical minimum.

Now consider a three sided die with faces $H,T,E$. The letters stand for "heads", "tails" and "either".

When we encode a sequence of these die rolls the encoder can decide whether it wants to replace $E$ with $H$ or $T$. So the compression is lossy since we lose information: we don't know which rolls resulted in $E$.

For example:

Say $p_H=p_T=p_E=\frac{1}{3}$ and the sequence we have to encode is $S=EHTETHETET...$.

One method is to replace every $E$ with $H$. So the sequence becomes $S'=HHTHTHHTHT...$. Then we can use arithmetic coding with $p_H'=\frac{2}{3}$. This method uses about $0.92$ bits per roll.

A better method is to first group rolls into consecutive pairs of two and then replace according to the following table: $$ \begin{matrix} HE\rightarrow HH\\ EH\rightarrow HH\\ TE\rightarrow TT\\ ET\rightarrow TT\\ EE\rightarrow TT\\ \end{matrix} $$

So $S'=HHTTTHTTTT...$. Then using arithmetic coding on the pairs we can encode the sequence using about $0.88$ bits per roll. This can be calculated from the pair probabilities $p_{TT}'=\frac{4}{9},\space\space\space p_{HH}'=\frac{3}{9},\space\space\space p_{HT}'=p_{TH}'=\frac{1}{9}$

I have two questions:

  1. Given the probabilities $p_H,p_T,p_E$ with $p_H+p_T+p_E=1$, how many bits do we need per dice roll?
  2. Is there some (practical) algorithm for encoding and decoding such sequences that is (close to) optimal?

I've tried to use Google and ChatGPT to find the name of this problem and I checked some questions tagged as . I'm not sure how to approach it other than thinking about encoding chunks.

A trivial lower bound for the number of bits required is $(1-p_E)H_b(\frac{p_H}{p_H+p_T})$ which can be derived by giving the decoder external knowledge of the positions of $E$s.

1

There are 1 best solutions below

1
On

This interesting problem seems to be quite more difficult than expected, and I found only a few relevant papers. In particular:

Sholomov, L. A., Compression of partially defined information, Comput. Math. Model. 19, No. 1, 116-132 (2008); translation from Nelinejn. Din. Upr. 4, 377-396 (2004). ZBL1172.62004.

In our context, it asserts (if I understand it right) that the effective entropy of this source (for infinite block length) equals the entropy of the "crisp" source (which corresponds to discarding the "wildcard" symbol). That is, $L \to (1-p_E) H_b(\frac{p_T}{p_T+p_H})$. In the case $p_E=p_T=p_H=1/3$, this gives $L \to 2/3$.

This is a surprisingly "good" result, but I guess the asymptotics is slow ($O(\log(n)/n)$ and the encoding is not simple.

I wonder if the encoding can be regarded as some sort of dual of the channel encoding of a erasure channel...