What is the expected cost of restarting the state of an arithmetic coder?

45 Views Asked by At

I was bored the other day so I thought it could be fun to revise some old information theory knowledge and write a small arithmetic coder.


Short introduction to arithmetic coding

An arithmetic coder basically works like this, we want to encode a string of symbols from a set, an alphabet. We estimate the probabilities for each symbol, and then we divide the interval $[0,1]$ into pieces which size are proportional to the estimated symbol probabilities. We start with the interval $[0,1]$. For each new symbol which we want to encode we shrink our current interval into the interval that corresponds to the encountered symbol.

When done coding we represent the interval with a number with minimal precision needed to represent the interval. This is the output of the encoder.


For various purposes - most of them practical - I decided to make it a block-coder in the sense that we encode $\leq N$ bits at one time and then reset the interval. $N$ can be chosen as we like but will probably benefit from being tied to machine word size, like $32$, $64$ or maybe $128$ bits.

This design has some practical benefits, but as I suspect, also carries a small penalty to coding efficiency.

Can we estimate how large this penalty will be on average? Will it be tied to the probability table?

My guess would be that it in general would be some fraction of one bit, but I have no proof of it.