Given two alphabets $A=\{a_1, a_2, ..., a_n\}$ and $B=\{b_1, b_2, b_3, ..., b_m\}$, what is the maximum average compression ratio possible to achieve by bijectively encoding strings of B using strings of A?
I guess it is trivially $\frac{|A|}{|B|}$, but I wouldn't be sure how to prove that.
Then, given a new set of strings C that is a subset of the strings of B, $C\subseteq B^{*}$, how would the maximum average compression ratio be calculated in this case?
I guess it depends on how many substrings in C are repeated more than other substrings, and how much more frequently they are repeated. E.g., if every string of C was the concatenation of a letter from B and a string of $B^*$, $\{b_1b_1b_2b_3,\, b_2b_1b_2b_3,\, b_3b_1b_2b_3,\, ...\}$, then you could use a single letter from A to represent the repeated substring ($b_1b_2b_3$) to achieve a much better compression ratio than if $C=B^4$.
An example to illustrate my question: encoding English words in bijective base-3. $A=\{1,2,3\},\, B=${a, b, c, ..., z}. If C were {one, two, three, four, five, six}, then the "compression algorithm" could represent these respectively as D = {1, 2, 3, 11, 12, 13}, and the average compression ratio would be $\left(\sum{\frac{|C_i|}{|D_i|}}\right)\div |C| = 2.75$ (I'm not using the empty string from $A^*$ only because I'm not sure how to meaningfully calculate average compression ratios when infinities are involved).
It depends on the data distribution. If the input is a fixed-length sequence of letters sampled iid and uniformly at random, then the maximum possible compression ratio is
$${\lg |A| \over \lg |B|}.$$
In particular, there are $k \lg |B|$ possibilities for a string of length $k$ over alphabet $B$, so if you sample uniformly at random from those possibilities, it requires $k \lg |B|$ bits to represent that. For similar reasons, a string of length $m$ over alphabet $A$ can represent up to $m \lg |A|$ bits of information. Setting
$$k \lg |B| \le m \lg A,$$
we discover that
$$k/m \le {\lg |A| \over \lg |B|},$$
and $k/m$ is the definition of the compression ratio. We can get as close to the ratio as desired, assuming $k$ can be increased as much as desired.
In practice, real data is normally not iid uniform random characters, so the compression ratio can potentially be far higher when one takes into account the probability distribution on strings that are input to the compression algorithm. For more on that subject, read about Shannon's source coding theorem.