Today, I was thinking about random stuff like I sometimes do. I thought of an interesting question that I have no clue how to solve.
It started with me thinking about taking all 3 letter strings ("words"), and putting them through a profanity filter, just as a simple way to find out what words are filtered. Then, I realized that there was a (slightly, perhaps) more efficient way to write all letter combinations.
The traditional method would be:
aaa, aab, aac, aad, ... , zzz
But my idea was to allow the 3 letter combinations to be combined with each other. For example,
abcdef = abc, bcd, cde, and def
A reduction of 6 characters.
I wanted to know the shortest total character length required to represent every 3 letter word. I had no clue where to start, so I decided to simplify the problem to numbers (012345 represents 012, 123, 234, and 345): in other words, to represent 000 to 999 in as few digits as possible.
The only way that I can think to do this is to write code and have a computer brute force through every combination, but that would probably take years. Does anyone know anything about how to go about solving this?
[Clarification] In the last paragraph I mentioned brute forcing through every combination. I meant continuously adding digits until all 1000 numbers in the sequence, and doing that for every possible <3000 digit sequence (3000 is the longest that the sequence has to be: simply 000 to 999 in order) to find the shortest sequence that represents all 1000 numbers.
De Bruijn sequences are indeed the answer. In fact there is an easy way of generating them.
If you restrict yourself to 3 symbols, you can take the alphabet $\{0,1,2\}$ which can be interpreted as the finite field (see wikipedia) $GF(3),$ where addition and multiplication is modulo 3.
Now there are sequences called maximal length sequences (see wikipedia here which discusses only $GF(2)$) over any field, and also over $GF(3),$ which are generated by recurrences corresponding to polynomials which are primitive. If you have a polynomial of degree $d,$ its roots have order $3^d-1$ and this polynomial generates the periodic maximal length sequence of length $3^d-1$. Within this sequence there is a unique position with $d-1$ consecutive zeroes.
Just insert an additional zero there and you get a DeBruijn sequence of length $3^d$ which represents every $d-$length pattern over $GF(3).$
You need to look for tables of primitive polynomials over $GF(3).$ See here
Example from that list:
Degree 4 10012
means (they display coefficients) the polynomial is
$$1*X^4+0*X^3+0*X^2+1*X+2*X^0$$
or $x^4+x^2+x-1$ mod 3, which corresponds to the recurrence
$$s[n-4]+s[n-2]+s[n-1]-s[n]=0 ~\pmod{3}$$
by taking multiplication by x as a shift operator, or equivalently
$$s[n]=s[n-1]+s[n-2]+s[n-4] \pmod{3}$$
as the generating recurrence. This recurrence, started with initial conditions (putting the zeroes here is convenient, followed by one nonzero entry)
$$s[0]=0,s[1]=0,s[2]=0,s[3]=1$$
will give a sequence $$(s[0],s[1],..,s[3^4-1])$$ of length 80. Just append a 0 in the beginning and you have your deBruijn sequence of length 81.
Edit: Smaller example
Degree 2 112
gives $X^2+X+2$ or $s[n-2]+s[n-1]+2s[n]=0,$ or $s[n]=s[n-1]+s[n-2] \pmod 3$
So start with $s[0]=0,s[1]=1$, get $s[2]=s[1]+s[0]=0+1=1$
$s[3]=1+1=2,$
$s[4]=1+2=0 \pmod 3,$
resulting in a sequence of length $3^2-1=8$ given by $$0,1,1,2,0,2,2,1;0,1,1,2,\ldots$$
Take one period and pre-pend a zero to get
$$0,0,1,1,2,0,2,2,1;0,1,\ldots$$ which spans the 9 possible states
$0,0\quad$ $0,1\quad$ $1,1\quad$ $1,2\quad$ $2,0\quad$ $0,2\quad$ $2,2\quad$ $2,1\quad$ $1,0$