For data compression, every codding that I've seen is binary. It means we convert a language with $N$ symbol size to a language with $M=2$ symbol size. For example, in Huffman coding, the goal is to find a binary coding ($M=2$) for each alphabet of English language ($N=26$).
If $M$ is not equal to $2$ and have a value larger than $N$, is there any method to find good coding (or optimal coding) for compression? Is there any research for this type of problem?
Is it a right assumption that when the target symbol size is larger, the goal is to find a map from a subset of source symbols to one target symbol?
2026-03-24 22:09:51.1774390191
Coding for data compression with large target's symbol set (where the target symbol set is larger than the source symbol set)
56 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
The same idea applies. You want each character to have a choice of $\frac 1M$ to come at each position in the message. That maximized the information provided by that character. If $N=7,M=11$ and the characters in the alphabet come in proportions $3,2,2,1,1,1,1$ you can assign three codes to the first one, two to the next two, and one code to the others and everything works perfectly. Of course I made up the example so it would. You can view runs of three characters in the original text as one character in a $7^3=343$ character alphabet, then you have $N \lt M$.