I have a problem in biology involving amino acids (think of them as a string of characters) that I want to formalise. Let assume we have a amino acid sequence of length 4, typical examples may be:
AABB
ACDR
RTKY
Now assume that each amino acid, can be modified by n different things (called PTMs). So taking the last example in the series, given one modification, you would get the following states (where the number below the sequence identifies the type of modification):
RTKY
RTKY
1
RTKY
1
RTKY
1
RTKY
1
RTKY
11
RTKY
1 1
RTKY
1 1
RTKY
11
RTKY
1 1
RTKY
111
RTKY
1 11
RTKY
11 1
RTKY
1 11
RTKY
1111
Given two modifications you would get
RTKY
RTKY
1
RTKY
2
RTKY
1
RTKY
2
RTKY
1
RTKY
2
RTKY
1
RTKY
2
RTKY
12
RTKY
12
RTKY
12
RTKY
1 2
RTKY
1 2
RTKY
1 2
RTKY
2
RTKY
21
...
...
RTKY
1111
...
...
RTKY
1212
....
....
RTKY
2121
....
....
RTKY
2222
How would one formalise this so that the number of states could be calculated given the length of the sequence and number of PTMs (Sequence modifiers)
Suppose we have a sequence of length $n$, and $c$ modifiers. Then for any entry in the sequence, we can put any one of the $c$ modifiers under it, and also leave the space blank (no modification). So at each place in the sequence there are $c+1$ choices, for a total of $(c+1)^n$.