Lempel-Ziv-Welch Encoding of single character sequence

428 Views Asked by At

So I have this typical exam question:

Consider an LZW coder for an alphabet with 32 symbols (thus represented by 5-bit symbols) using a dictionary of size 256 (thus indexed by 8-bit words). The minimum length of a sequence of consecutive a’s (“aaa...a”) such that its LZW compression has fewer bits than the sequence itself is?

The answer given is 5.

Can anyone explain me how do I get the answer?

Thank you in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

For illustration, let's say we are encoding an infinite stream "aaa...".

  1. The shortest substring not in the dictionary is 'aa' (since 'a' is already in it). Hence, we output the codeword for 'a' (which is one byte) and introduce 'aa' into the dictionary. So far one character (5 bits) has been output in one byte.

  2. Starting at the second character, now the shortest not in the dictionary is 'aaa'. Hence, we output the code word for 'aa' (which is one byte) and introduce 'aaa' into the dictionary. Now three characters (1 byte and 1 bit) have been output in two bytes.

  3. Starting at the fourth character, the shortest string not in the dictionary is 'aaaa'. Hence we output the code word for 'aaa' (which is one byte) and introduce 'aaaa'. Now six characters (3 bytes and 6 bits) have been output in three bytes.

This is roughly the break-even point. Since the actual sequence terminates, we must consider what happens for different strings lengths.

We have already seen that three characters is such that LZW takes more bits. Four characters takes two and a half bytes of storage, and LZW will take three bytes: it will encode 'a' and 'aa' like before, but then it needs one more byte to encode the remaining 'a'. But five characters takes just over 3 bytes, and LZW will encode 'a', 'aa', 'aa', for a total of 3 bytes.