Lexicographic index of strings with repeated characters

188 Views Asked by At

As the title says, I need to find a way (or a function, specifically) to convert a string, let's say $AABACCAB$, to it's lexicographic number among its permutations.

The number of different words that these specified characters are:

$$N=\frac{8!}{4!2!2!}$$

Is there a way to enumerate the words from $0$ to $N-1$? And that every sequence has its unique number? (bijective function)

Thanks.

1

There are 1 best solutions below

0
On BEST ANSWER

To enumerate the words in lexicographical order, start with the letters in alphabetical order, AAAABBCC; in each step, find the last upward transition (in this case from B to C), swap the letter before that transition (in this case B) with the next highest letter that occurs (not necessarily directly) after it (in this case C) and order everything after the transition alphabetically. This yields:

AAAABBCC
AAAABCBC
AAAABCCB
AAAACBBC
AAAACBCB
AAAACCBB
AAABABCC
...

This of course defines a numbering, and the number can be computed recursively without producing the list, namely as the sum of the number of strings that can be formed with "lower" first letters plus the number of the string excluding the first letter. For instance, the number of the string CBAABACA is the sum of $\binom7{3,2,2}$ for strings starting with A and $\binom7{4,2,1}$ for strings starting with B and the number of BAABACA among all strings with $4$ A's, $2$ B's and $1$ C.