Generating function for runs in alphabets

153 Views Asked by At

Let $\mathcal{J}$ be a finite set of letters. Suppose that $|\mathcal{J}| = m$. Now designate a letter from the set $\mathcal{J}$. What is the ordinary generating function of words with letters from $\mathcal{J}$ without $k$ consecutive occurrences of the designated letter. For the case $m=2$ the generating function is easily seen to be $$ \dfrac{1-z^k}{1-2z+z^{k+1}}$$

However I am having difficulty in finding the generating funcution for arbitrary $m$. Any help would be appreciated. Thanks.

3

There are 3 best solutions below

3
On BEST ANSWER

The languages you're describing are regular, which means they admits rational generating functions. Indeed, we can read the generating function off of a regular expression for the language.

For example, if $k=3$ and $m=2$, the language (where we cannot repeat $a$) is described by

$$(b + ab + aab)^* (1 + a + aa)$$

We can read off from this a generating function

$$\frac{1 + a + a^2}{1 - (b + ab + a^2 b)}$$

(using the fact that $L^* = 1 + L + L^2 + \ldots = \frac{1}{1-L}$).

It turns out the coefficient of $a^nb^m$ in the above series is exactly the number of words using $n$ $a$s and $m$ $b$s! So if we only care about the length, we can just replace every letter by one variable $z$. This gives us

$$\frac{1 + z + z^2}{1 - z - z^2 - z^3}$$

and taylor expanding this gives the correct sequence.


Now that we have the basics down, let's do this in general. For $m$ letters $a_1, \ldots a_m$, say we want to avoid $k$ consecutive copies of $a_1$. Then our language is given by:

$$ \left ( 1 + a_1 + a_1^2 + \ldots + a_1^{k-1} \right ) \left [ \left ( \sum_{i=2}^m a_i \right )^+ \left ( a_1 + a_1^2 + \ldots + a_1^{k-1} \right ) \right ]^* \left ( \sum_{i=2}^m a_i \right )^* $$

(do you see why?)

Since we only care about the total number, we again substitute $z$ for each variable, then cash out the kleene stars for $\frac{1}{1-L}$s and get

$$ \frac{1 + z + z^2 + \ldots + z^{k-1}}{1 - (m-1)z - (m-1)z^2 - \ldots - (m-1)z^k} $$


I hope this helps ^_^

0
On

The Component Generating Functions

The generating function for the number of choices for (possibly empty) runs of $j$ of the $m-1$ unrestricted letters is $$ \sum_{j=0}^\infty(m-1)^jx^j=\frac1{1-(m-1)x}\tag{1a} $$ For non-empty runs of the unrestricted letters, we subtract $1$: $$ \sum_{j=1}^\infty(m-1)^jx^j=\frac{(m-1)x}{1-(m-1)x}\tag{1b} $$ The generating function for the number of choices for (possibly empty) runs of $j$ of the restricted letter is $$ \sum_{j=0}^{k-1}x^j=\frac{1-x^k}{1-x}\tag{2a} $$ For non-empty runs of the restricted letter, we subtract $1$: $$ \sum_{j=1}^{k-1}x^j=\frac{x-x^k}{1-x}\tag{2b} $$

Combining the Components

The general string starts with a (possibly empty) string of the restricted letter: $$ \frac{1-x^k}{1-x}\tag3 $$ followed by a (possibly empty) repetition of a non-empty string of the unrestricted letters followed by a non-empty string of the restricted letter: $$ \sum_{j=0}^\infty\left(\frac{(m-1)x}{1-(m-1)x}\frac{x-x^k}{1-x}\right)^j=\frac1{1-\frac{(m-1)x}{1-(m-1)x}\frac{x-x^k}{1-x}}\tag4 $$ followed by a (possibly empty) string of the unrestricted letters: $$ \frac1{1-(m-1)x}\tag5 $$ Thus, the full generating function is $$ \begin{align} &\frac{1-x^k}{1-x}\frac1{1-\frac{(m-1)x}{1-(m-1)x}\frac{x-x^k}{1-x}}\frac1{1-(m-1)x}\tag{6a}\\ &=\frac{1-x^k}{(1-(m-1)x)(1-x)-(m-1)x\left(x-x^k\right)}\tag{6b}\\ &=\frac{1-x^k}{1-mx+(m-1)x^{k+1}}\tag{6c} \end{align} $$ $\text{(6c)}$ is the generating function for the number of strings of a given length composed of letters from an $m$ letter alphabet where one restricted letter does not appear in any substring of $k$ consecutive letters.

0
On

A convenient starting point is also the generating function of words with no consecutive equal characters at all. These words are called Smirnov or Carlitz words. See for example III.24 Smirnov words in Analytic Combinatorics by Philippe Flajolet and Robert Sedgewick for more information.

The generating function for the number of Smirnov words over an $m$-ary alphabet is given by \begin{align*} \left(1-\frac{mz}{1+z}\right)^{-1}\tag{1} \end{align*} In the current problem a designated character may have runs with length $< k$. We respect this by replacing the designated character by \begin{align*} z\to z+z^2+\cdots +z^{k-1}=\frac{z-z^{k}}{1-z}\tag{2.1} \end{align*} Since there are no restrictions to other characters we can replace each occurrence of these characters by one or more occurrences of it. \begin{align*} z\longrightarrow z+z^2+\cdots=\frac{z}{1-z}\tag{2.2} \end{align*}

We obtain from (1) by substituting (2.1) and (2.2) \begin{align*} &\left(1-\frac{(m-1)\frac{z}{1-z}}{1+\frac{z}{1-z}}-\frac{\frac{z-z^k}{1-z}}{1+\frac{z-z^k}{1-z}}\right)^{-1}\\ &\qquad=\left(1-(m-1)z-\frac{z-z^k}{1-z^k}\right)^{-1}\tag{3.1}\\ &\qquad=\frac{1-z^k}{1-z^k-(m-1)z\left(1-z^k\right)-\left(z-z^k\right)}\tag{3.2}\\ &\qquad\,\,\color{blue}{=\frac{1-z^k}{1-mz+(m-1)z^{k+1}}}\tag{3.3} \end{align*}

Comment:

  • In (3.1) we expand numerator and denominator with $1-z$.

  • In (3.2) we use $1-z^k$ as common denominator.

  • In (3.3) we do some final simplifications.