Naming binary strings

124 Views Asked by At

Given a string of 1's and 0's, containing the same number of 1's as 0's, is there a way to uniquely "name" each string with a number such that it is computationally easy to go between this "name" and this string.

Here is an example of what would be a bad solution: ordering the strings from smallest to largest and naming each based on its position in this list. This solution is poor because converting between the name and string requires a lot of computation.

I would also like the size of these names to be minimized. Another poor solution would be to have the name be the first n-1 digits of a string of length n. Example: string = 1011010001, name = 101101000. Although converting between string and name is easy, the size of the name isn't really all that much smaller.

Consider the case where the string has a length of 10. It is easy to see that there are 252 possible strings. Therefor an optimal solution would have the names of the string consist of a list of 8 1's and 0's.

I would like a solution that can be generalized to a string of any even length. Also, I'm not necessarily looking for the perfect solution, just one that works.

Thanks

3

There are 3 best solutions below

4
On

Why not just let the name be the integer that the string denotes as a binary numeral: $$ name\colon b_{n-1} \dotsc b_0 \mapsto \sum_{k=0}^n b_k 2^k\quad\text{($n\ge 0$)}. $$ This is unambiguous on the language $L = \{s\in \{0,1\}^*\mid \text{# of $0$s in $s = $ of $1$s in $s$}\}$: Only the empty string is mapped to $0$. If $name(s) = name(t)$ with $s = b_n \dotsc b_0, t = b_m \dotsc b_0 \in L$, then clearly $s$ and $t$ can only differ by having different numbers of $0$s in their leftmost positions. But then their number of initial $0$s must be equal, as the two strings have the same number of zeros and have to agree from the first $1$ to their ends.

2
On

I understand from your comment that your question is a compression problem. However, the number of strings containing $n$ $0$'s and $n$ $1$'s is equal to the central binomial coefficient $\binom{2n}{n}$. By Stirling formula, $n! \sim \sqrt{2\pi n}({n \over e})^n$ and it follows that $\binom{2n}{n} \sim {4^n \over \sqrt{\pi n}}$. Now, the number of strings of length $2n$ is $4^n$. This means that you can only hope to save $\log_2(\sqrt{\pi n}) \approx 0.825 + \log_2 n$ bits, a very small number compared to $n$. For instance, if $n = 2^{270}$, a rough estimate of the number of atoms in the universe, you would only save $270$ bits. Is it really worth trying, knowing that you would also have to store the compression algorithm?

0
On

Proceeding, say, from the beginning of the string, write each letter as itself if it's preceded by equal numbers of zeros and ones, and else write it as a one if it's in the minority in the preceding values and as a zero otherwise. This slightly raises the number of ones in the string, and you can try to compress the resulting string by making use of that, e.g. with a prefix code that has shorter codes for sequences with more ones. As J.-E. Pin has pointed out, the potential for compression is very limited, and this will certainly not exhaust even that small potential, but it might be worth a try.