Shortest Way to Represent Every 3 Digit String

353 Views Asked by At

Today, I was thinking about random stuff like I sometimes do. I thought of an interesting question that I have no clue how to solve.

It started with me thinking about taking all 3 letter strings ("words"), and putting them through a profanity filter, just as a simple way to find out what words are filtered. Then, I realized that there was a (slightly, perhaps) more efficient way to write all letter combinations.

The traditional method would be:

aaa, aab, aac, aad, ... , zzz

But my idea was to allow the 3 letter combinations to be combined with each other. For example,

abcdef = abc, bcd, cde, and def

A reduction of 6 characters.

I wanted to know the shortest total character length required to represent every 3 letter word. I had no clue where to start, so I decided to simplify the problem to numbers (012345 represents 012, 123, 234, and 345): in other words, to represent 000 to 999 in as few digits as possible.

The only way that I can think to do this is to write code and have a computer brute force through every combination, but that would probably take years. Does anyone know anything about how to go about solving this?

[Clarification] In the last paragraph I mentioned brute forcing through every combination. I meant continuously adding digits until all 1000 numbers in the sequence, and doing that for every possible <3000 digit sequence (3000 is the longest that the sequence has to be: simply 000 to 999 in order) to find the shortest sequence that represents all 1000 numbers.

1

There are 1 best solutions below

3
On BEST ANSWER

De Bruijn sequences are indeed the answer. In fact there is an easy way of generating them.

If you restrict yourself to 3 symbols, you can take the alphabet $\{0,1,2\}$ which can be interpreted as the finite field (see wikipedia) $GF(3),$ where addition and multiplication is modulo 3.

Now there are sequences called maximal length sequences (see wikipedia here which discusses only $GF(2)$) over any field, and also over $GF(3),$ which are generated by recurrences corresponding to polynomials which are primitive. If you have a polynomial of degree $d,$ its roots have order $3^d-1$ and this polynomial generates the periodic maximal length sequence of length $3^d-1$. Within this sequence there is a unique position with $d-1$ consecutive zeroes.

Just insert an additional zero there and you get a DeBruijn sequence of length $3^d$ which represents every $d-$length pattern over $GF(3).$

You need to look for tables of primitive polynomials over $GF(3).$ See here

Example from that list:

Degree 4 10012

means (they display coefficients) the polynomial is

$$1*X^4+0*X^3+0*X^2+1*X+2*X^0$$

or $x^4+x^2+x-1$ mod 3, which corresponds to the recurrence

$$s[n-4]+s[n-2]+s[n-1]-s[n]=0 ~\pmod{3}$$

by taking multiplication by x as a shift operator, or equivalently

$$s[n]=s[n-1]+s[n-2]+s[n-4] \pmod{3}$$

as the generating recurrence. This recurrence, started with initial conditions (putting the zeroes here is convenient, followed by one nonzero entry)

$$s[0]=0,s[1]=0,s[2]=0,s[3]=1$$

will give a sequence $$(s[0],s[1],..,s[3^4-1])$$ of length 80. Just append a 0 in the beginning and you have your deBruijn sequence of length 81.

Edit: Smaller example

Degree 2 112

gives $X^2+X+2$ or $s[n-2]+s[n-1]+2s[n]=0,$ or $s[n]=s[n-1]+s[n-2] \pmod 3$

So start with $s[0]=0,s[1]=1$, get $s[2]=s[1]+s[0]=0+1=1$

$s[3]=1+1=2,$

$s[4]=1+2=0 \pmod 3,$

resulting in a sequence of length $3^2-1=8$ given by $$0,1,1,2,0,2,2,1;0,1,1,2,\ldots$$

Take one period and pre-pend a zero to get

$$0,0,1,1,2,0,2,2,1;0,1,\ldots$$ which spans the 9 possible states

$0,0\quad$ $0,1\quad$ $1,1\quad$ $1,2\quad$ $2,0\quad$ $0,2\quad$ $2,2\quad$ $2,1\quad$ $1,0$