How to decide `end of transmission` symbol for arithmetic coding?

200 Views Asked by At

In Mackay's Information Theory book (p. 111), he's said that

Let the source alphabet be $\mathcal{A}_X = \{a_1, . . . , a_I\}$, and let the $I$th symbol $a_I$ have the special meaning ‘end of transmission’.

which seems quite arbitrary to me. More importantly, there might involve a risk of corrupting decoding process, since $a_I$ cannot be translated (it's replaced with end of transmission symbol).

To specify my idea, suppose, we have $\mathcal{A}_x=\{a,b,c\}$ and probabilities $p (a) = 0 . 25 , p (b) = 0 . 25 , p (c) = 0 . 5$. If we want to encode a string $abc$, how to decide the end of transmission symbol and its probability? Are we randomly set some probability $p(d)$ as end of transimisson and then rescale $p(a), p(b), p(c)$ according to that probability?

Edited

To be clear, the scope of question is artihmatric coding.

2

There are 2 best solutions below

4
On

Ideally, you will choose a probability for EOT dependant on how long your message typically is, as though it were part of every message in your corpus. Thus in your probability selecting corpus "abc" is really "abc." And "qrxjundieko" is really "qrxjundieko." And so forth.

Note that my use of a period there implies that there are no other periods in the entire corpus. Really you use a special symbol that's guaranteed not to be there.

2
On

More importantly, there might involve a risk of corrupting decoding process, since $a_I$ cannot be translated

No, the symbol $a_I$ means exclusively EOT (end of trasmission). If you want to trasmit three symbols $\mathcal{A}_x=\{a,b,c\}$ (which might appear anywhere in the stream) then you need to augment that with the extra EOT symbol.

Bear in mind that the EOT symbol is only one way, another possibility is to prepend the coded sequence with the sequence length.