Coding for data compression with large target's symbol set (where the target symbol set is larger than the source symbol set)

56 Views Asked by At

For data compression, every codding that I've seen is binary. It means we convert a language with $N$ symbol size to a language with $M=2$ symbol size. For example, in Huffman coding, the goal is to find a binary coding ($M=2$) for each alphabet of English language ($N=26$).
If $M$ is not equal to $2$ and have a value larger than $N$, is there any method to find good coding (or optimal coding) for compression? Is there any research for this type of problem?
Is it a right assumption that when the target symbol size is larger, the goal is to find a map from a subset of source symbols to one target symbol?

1

There are 1 best solutions below

3
On

The same idea applies. You want each character to have a choice of $\frac 1M$ to come at each position in the message. That maximized the information provided by that character. If $N=7,M=11$ and the characters in the alphabet come in proportions $3,2,2,1,1,1,1$ you can assign three codes to the first one, two to the next two, and one code to the others and everything works perfectly. Of course I made up the example so it would. You can view runs of three characters in the original text as one character in a $7^3=343$ character alphabet, then you have $N \lt M$.