Using arithmetic coding we can encode a sequence of independent biased coin flips using $<1$ bit per coin flip. For example consider a coin with $p=0.8$. We can encode the flips of this coin using $H_b(p)=-p\log_2\left(p\right)-(1-p)\log_2\left(1-p\right)\approx 0.7$ bits, which is the theoretical minimum.
Now consider a three sided die with faces $H,T,E$. The letters stand for "heads", "tails" and "either".
When we encode a sequence of these die rolls the encoder can decide whether it wants to replace $E$ with $H$ or $T$. So the compression is lossy since we lose information: we don't know which rolls resulted in $E$.
For example:
Say $p_H=p_T=p_E=\frac{1}{3}$ and the sequence we have to encode is $S=EHTETHETET...$.
One method is to replace every $E$ with $H$. So the sequence becomes $S'=HHTHTHHTHT...$. Then we can use arithmetic coding with $p_H'=\frac{2}{3}$. This method uses about $0.92$ bits per roll.
A better method is to first group rolls into consecutive pairs of two and then replace according to the following table: $$ \begin{matrix} HE\rightarrow HH\\ EH\rightarrow HH\\ TE\rightarrow TT\\ ET\rightarrow TT\\ EE\rightarrow TT\\ \end{matrix} $$
So $S'=HHTTTHTTTT...$. Then using arithmetic coding on the pairs we can encode the sequence using about $0.88$ bits per roll. This can be calculated from the pair probabilities $p_{TT}'=\frac{4}{9},\space\space\space p_{HH}'=\frac{3}{9},\space\space\space p_{HT}'=p_{TH}'=\frac{1}{9}$
I have two questions:
- Given the probabilities $p_H,p_T,p_E$ with $p_H+p_T+p_E=1$, how many bits do we need per dice roll?
- Is there some (practical) algorithm for encoding and decoding such sequences that is (close to) optimal?
I've tried to use Google and ChatGPT to find the name of this problem and I checked some questions tagged as compression. I'm not sure how to approach it other than thinking about encoding chunks.
A trivial lower bound for the number of bits required is $(1-p_E)H_b(\frac{p_H}{p_H+p_T})$ which can be derived by giving the decoder external knowledge of the positions of $E$s.
This interesting problem seems to be quite more difficult than expected, and I found only a few relevant papers. In particular:
Sholomov, L. A., Compression of partially defined information, Comput. Math. Model. 19, No. 1, 116-132 (2008); translation from Nelinejn. Din. Upr. 4, 377-396 (2004). ZBL1172.62004.
In our context, it asserts (if I understand it right) that the effective entropy of this source (for infinite block length) equals the entropy of the "crisp" source (which corresponds to discarding the "wildcard" symbol). That is, $L \to (1-p_E) H_b(\frac{p_T}{p_T+p_H})$. In the case $p_E=p_T=p_H=1/3$, this gives $L \to 2/3$.
This is a surprisingly "good" result, but I guess the asymptotics is slow ($O(\log(n)/n)$ and the encoding is not simple.
I wonder if the encoding can be regarded as some sort of dual of the channel encoding of a erasure channel...