How to represent a number in such a way that no more than 2 consecutive digits are the same?

472 Views Asked by At

The idea is to lower the probability of transcription errors when a person is reading the number on a paper and typing it on a computer, for instance.

I'd be more interested in Base-58 notation, but decimal would also be good.

2

There are 2 best solutions below

1
On BEST ANSWER

From an information-theoretic perspective you should be able to beat the following; however, the following is very easy to compute with. It eliminates all repeated symbols, though you could choose to allow double repeats if you like. Given a $b$-character alphabet, designate the last symbol as a "repeat" symbol. Encode your number in base $b-1$, and whenever a digit is repeated twice, overwrite the second one with the repeat symbol. Thus, for example, if you're using decimal digits, you would encode $40$ as follows: write it in base $9$ to get $44$, and now replace the second $4$ with the repeat symbol to get $49$. Decoding is easy.

Since an $n$-digit string now encodes numbers up to $(b-1)^n$ instead of $b^n$, you end up expanding your numbers a bit. The inefficiency gets smaller as the base gets larger. In particular, for base $58$ you end up expanding your numbers by a factor of $\log 58/\log 57 \approx 1.0043$, i.e. by $0.43\%$.

It feels like the "right" solution should look like this: the total number of length-$n$ strings with no $3$-long repetitions satisfies the recurrence $$a_n=(b-1)(a_{n-1}+a_{n-2})$$ with the initial conditions $a_0=1$, $a_1=b$, $a_2=b^2$. The largest root of the characteristic equation of this recurrence is approximately $b-1/b$, so the information bound is approximately $$\frac{\log b}{\log (b-1/b)}\approx 1+\frac{1}{b^2\log b}.$$ For $b=58$ you get about $1.00007$.

But you have to come up with an efficiently computable bijection between the integers $\{0,\ldots, a_n-1\}$ with the $a_n$ strings of length $n$. It feels like there should be some analogue of Zeckendorf representation, especially as the sequence $a_n$ is just the (shifted) Fibonacci sequence when $b=2$. I thought about it a bit but haven't come up with anything efficient.

2
On

Here are 2 improvements to Tad's answer:

1) Instead of using base $b−1$, use any foreign symbol as an indicator of repetition, thus lowering the overhead. First get the number represented in the base you want, then apply this algorithm. Examples in base 10:

  • 333 becomes 33-

  • 3335 becomes 33-5

  • 3333 becomes 33-3

  • 333333 becomes 33-33-

2) Alternatively, we can use run-length encoding (RLE) and any foreign symbol to further reduce the overhead. Examples in base 10:

  • 333 becomes 33-0

  • 3335 becomes 33-05

  • 3333 becomes 33-1

  • 333333 becomes 33-3

[new item] 3) Thinking about it a bit more, we can use run-length encoding (RLE) and a previously agreed number of maximum repeated digits (which could be used as a single-digit prefix) to further reduce the overhead. Examples for a maximum of 3 equal consecutive input digits in base 10 (resulting in a maximum of 4 repeated digits in the output):

  • 444 becomes 4440

  • 4445 becomes 44405

  • 4444 becomes 4441

  • 333333 becomes 3333 (3 input digits repeated then 3 as the count)

  • 4444444 becomes 4444 (3 input digits repeated then 4 as the count)

  • 999999999999 becomes 9999