Help with finding the generating function of this language?

125 Views Asked by At

I've simplified this a bit so that I can just get help with the basic steps.

Say we have a language of all words over $\{a,b,c,d\}$ where the only letters allowed to commute are $ab$. I need help understanding how to define equivalence classes of words in the language. Then, take a representative of each equivalence class and find a generating function for the sequence of these equivalence class representatives.

To make sure I understand the language, these would be the possible words, yes? Or am I missing some? I think maybe the restriction that only $ab$ commute is confusing me. $$a, b,c,d$$ $$ab, ac,ad,bc, bd, cd, ba$$ $$abc,abd,acd,bcd,bac, bad$$ $$abcd, bacd$$

What would each equivalence class be? Words of the same length? Then how do I select a representative from each? Can I just pick any one? So maybe these would be them? $$a,ab,abc,abcd$$ But then these would be the same representatives even if we said only $ab$ can commute so that doesn't seem right.

And then finally, once I find these, how do I find a generating function?

2

There are 2 best solutions below

3
On BEST ANSWER

Note: Main theme of this answer is to find a generating function for the language. We develop it by foot. Then we indicate a Rolls Royce variant.

If an alphabet $\mathcal{A}=\{a,b,c,d\}$ is given there are $4^n$ different words of length $n$, since for each of the $n$ positions within a word one out of four letters from $\mathcal{A}$ can be chosen.

The language $\mathcal{L}$

Let's assume that $ab$ commute, so that $ab$ and $ba$ are not distinguishable. This way we can choose either $ab$ or $ba$ as representative for the corresponding equivalence class. We select in the following lexicographically ordered $ab$ as representative for the equivalence class \begin{align*} [ab]=\{ab,ba\} \end{align*} The language $\mathcal{L}$ so defined consists of all words built from the alphabet $\mathcal{A}$ which does not contain the substring $ba$, since we always choose the representative $ab$ instead.

Let's look for all words in $\mathcal{L}$ of length $\leq 4$. We identify all words which contain a substring $ba$ since these are the words which are in a corresponding equivalence class with $ab$ and which are to exclude. Words of length $1$ and $2$ are simple.

Words in $\mathcal{L}$ with length $\leq 4$

  • Words of length $=1$: There are $4^1=4$ words of length $1$ which can be built from $\mathcal{A}$, namely \begin{align*} a,b,c,d \end{align*} and of course none of them contains a substring $ba$. So all of them are elements of the language $\mathcal{L}$

From now on we group words according to the number of occurrences of the substring $ba$. Words of length $2$ and $3$ have either zero or one occurrence of $ba$, words of length $4$ and $5$ have zero, one or two occurrences of $ba$.

  • Words of Length $=2$: Words of length $2$ have the shape \begin{align*} uv\qquad \color{blue}{ba} \qquad\qquad u,v \in\mathcal{A}\\ \end{align*} We have two groups. The number of different words in the first group $uv$, which are all words of length $2$ is $4\cdot4=16$. Since we want to count valid words only, we have to subtract the words from the second group, which is only $1$ in this case. We conclude

There are \begin{align*} 4^2-1=15 \end{align*} words in $\mathcal{L}$ with length $2$.

  • Length $=3$: Words of length $3$ have the shape \begin{array}{ccc} uvw\qquad &\color{blue}{ba}u\qquad&\qquad\qquad u,v \in\mathcal{A}\\ \qquad &u\color{blue}{ba}\qquad &\\ \end{array} We have two groups. The number of different words in the first group $uvw$, which are all words of length $3$ is $4\cdot3=64$. Since we want to count valid words only, we have to subtract all words from the second group, which is $\binom{2}{1}4=8$ in this case. There are $\binom{2}{1}$ possibilities to position $ba$ and $4$ possibilities for $u$. We conclude

There are \begin{align*} 4^3-\binom{2}{1}4=56 \end{align*} words in $\mathcal{L}$ with length $3$.

  • Length $=4$: Words of length $4$ have the shape \begin{array}{cccc} uvwx\qquad &\color{blue}{ba}uv\qquad &\color{blue}{ba}\color{blue}{ba}\\ \qquad &u\color{blue}{ba}v\qquad&&\qquad u,v \in\mathcal{A}\\ \qquad &uv\color{blue}{ba}\qquad&&\\ \end{array} We have three groups. The number of different words in the first group $uvwx$, which are all words of length $4$ is $4\cdot4=256$. Since we want to count valid words only, we have to subtract all words from the second group, which is $\binom{3}{1}4^2=8$ in this case. There are $\binom{3}{1}$ possibilities to position $ba$ and $4^2$ possibilities for $uv$. Since we count occurrences of $ba$ twice in a word more than once, we have to add all words with $ba$ twice as substring. We conclude

There are \begin{align*} 4^4-\binom{3}{1}4^2+\binom{2}{2}4^0=209 \end{align*} words in $\mathcal{L}$ with length $4$.

We obsere a nice scheme. According to the Inclusion-exclusion principle we subtract and add words containing an increasing number of substrings $ba$. In case words have length $n$ and $k$ occurrences of $ba$, there are $\binom{n}{k}$ possibilities to choose the positions of $ba$ and $n-2k$ positions left for the other letters which give a total of \begin{align*} \binom{n}{k}4^{n-2k} \end{align*} different words for $n\geq 0$ and $0\leq k\leq \left\lfloor\frac{n}{2}\right\rfloor$.

We conclude the following is valid. There are \begin{align*} \sum_{k=0}^{ \left\lfloor\frac{n}{2}\right\rfloor}\binom{n-k}{k}(-1)^k4^{n-2k}\qquad\qquad n\geq 0 \end{align*} different words of length $n$ in $\mathcal{L}$.

We allow $n=0$ which means that the empty word $\varepsilon$ of length zero is also element in $\mathcal{L}$.

Generating function $L(x)$

A generating function is a formal power series

\begin{align*} L(x)=\sum_{n=0}^{\infty}a_nx^n \end{align*} with the coefficient $a_n$ of $x^n$ denoting the number of words of length $n$ in $\mathcal{L}$.

Since we know the number of words of $\mathcal{L}$ we can write the generating function as \begin{align*} L(x)&=\sum_{n=0}^{\infty}\sum_{k=0}^{ \left\lfloor\frac{n}{2}\right\rfloor}\binom{n-k}{k}(-1)^k4^{n-2k}x^n\\ &=1+4x+15x^2+56x^3+209x^4+780x^5+\cdots \end{align*}

We can do even better. We derive an expression of $L(x)$ as rational function in $x$. In order to do so we use the coefficient of operator $[x^n]$ to denote the coefficient of $x^n$ in a generating series and transform the coefficients of $L(x)$ appropriately.

We obtain \begin{align*} \sum_{k=0}^n&\binom{n-k}{k}(-1)^k4^{n-2k}\tag{1}\\ &=\sum_{k=0}^n\binom{k}{n-k}(-1)^{n-k}4^{2k-n}\tag{2}\\ &=\sum_{k=0}^\infty\binom{k}{n-k}\left(-\frac{1}{4}\right)^{n-k}4^k\tag{3}\\ &=\sum_{k=0}^\infty4^k[x^{n-k}]\sum_{l=0}^k\binom{k}{l}\left(-\frac{1}{4}\right)^{l}x^l\tag{4}\\ &=\sum_{k=0}^\infty[x^{n}](4x)^k\left(1-\frac{1}{4}x\right)^k\tag{5}\\ &=[x^n]\sum_{k=0}^\infty\left(4x-x^2\right)^k\tag{6}\\ &=[x^n]\frac{1}{1-4x-x^2}\\ \end{align*}

Comment:

  • In (1) we put for convenience only the upper limit of the index $k$ to $n$ without changing anything, since we use the convention $\binom{n}{k}=0$ if $0<n<k$.

  • In (2) we exchange $k$ with $n-k$.

  • In (3) we do a small rearrangement and put the upper limit of the index $k$ to $\infty$ for convenience only. This changes nothing with the same reasoning as in (1).

  • In (4) we write $\binom{k}{n-k}\left(-\frac{1}{4}\right)^{n-k}$ as a Cauchy product $[x^{n-k}]\sum_{l=0}^k\binom{k}{l}\left(-\frac{1}{4}\right)^{l}x^l$. This is the core of the derivation since it allows a drastic simplification in the next step.

  • In (5) we can write the polynomial in $x$ as $k$-th power of a binom.

  • In (6) we obtain a geometric series representation which gives us in the next line a representation as rational function in $x$.

We conclude

\begin{align*} L(x)&=\frac{1}{1-4x+x^2}\tag{7}\\ &=1+4x+15x^2+56x^3+209x^4+780x^5+\cdots \end{align*}

Note: In fact, there is a powerful method to solve problems of this kind, called the Goulden-Jackson Cluster Method. In this context we regard $ba$ as bad word and derive immediately (see formula of $f(s)$ on page $9$) the rational expression (7) of $L(x)$.

0
On

It looks like you're working with the set of ALL strings whose characters are $a,b,c,d$. You are defining an equivalence class by saying that $uabv \equiv ubav$ (where $u$ and $v$ are arbitrary strings), and then taking the transitive closure to make it an equivalence relation.

Hence, $ bacab \equiv bacba \equiv abcba \equiv abcab \equiv bacab$, so the equivalence class for $bacab$ would be $\{bacab ,bacba ,abcba ,abcab\}$. As for the representative, I would choose $abcab$, where there are no substrings of the form $ba$.

I'm not sure what you mean by a "generating function" in this context; generating functions usually are for sequences of real numbers.