I've simplified this a bit so that I can just get help with the basic steps.
Say we have a language of all words over $\{a,b,c,d\}$ where the only letters allowed to commute are $ab$. I need help understanding how to define equivalence classes of words in the language. Then, take a representative of each equivalence class and find a generating function for the sequence of these equivalence class representatives.
To make sure I understand the language, these would be the possible words, yes? Or am I missing some? I think maybe the restriction that only $ab$ commute is confusing me. $$a, b,c,d$$ $$ab, ac,ad,bc, bd, cd, ba$$ $$abc,abd,acd,bcd,bac, bad$$ $$abcd, bacd$$
What would each equivalence class be? Words of the same length? Then how do I select a representative from each? Can I just pick any one? So maybe these would be them? $$a,ab,abc,abcd$$ But then these would be the same representatives even if we said only $ab$ can commute so that doesn't seem right.
And then finally, once I find these, how do I find a generating function?
Note: Main theme of this answer is to find a generating function for the language. We develop it by foot. Then we indicate a Rolls Royce variant.
If an alphabet $\mathcal{A}=\{a,b,c,d\}$ is given there are $4^n$ different words of length $n$, since for each of the $n$ positions within a word one out of four letters from $\mathcal{A}$ can be chosen.
Let's look for all words in $\mathcal{L}$ of length $\leq 4$. We identify all words which contain a substring $ba$ since these are the words which are in a corresponding equivalence class with $ab$ and which are to exclude. Words of length $1$ and $2$ are simple.
We obsere a nice scheme. According to the Inclusion-exclusion principle we subtract and add words containing an increasing number of substrings $ba$. In case words have length $n$ and $k$ occurrences of $ba$, there are $\binom{n}{k}$ possibilities to choose the positions of $ba$ and $n-2k$ positions left for the other letters which give a total of \begin{align*} \binom{n}{k}4^{n-2k} \end{align*} different words for $n\geq 0$ and $0\leq k\leq \left\lfloor\frac{n}{2}\right\rfloor$.
We allow $n=0$ which means that the empty word $\varepsilon$ of length zero is also element in $\mathcal{L}$.
We can do even better. We derive an expression of $L(x)$ as rational function in $x$. In order to do so we use the coefficient of operator $[x^n]$ to denote the coefficient of $x^n$ in a generating series and transform the coefficients of $L(x)$ appropriately.
Comment:
In (1) we put for convenience only the upper limit of the index $k$ to $n$ without changing anything, since we use the convention $\binom{n}{k}=0$ if $0<n<k$.
In (2) we exchange $k$ with $n-k$.
In (3) we do a small rearrangement and put the upper limit of the index $k$ to $\infty$ for convenience only. This changes nothing with the same reasoning as in (1).
In (4) we write $\binom{k}{n-k}\left(-\frac{1}{4}\right)^{n-k}$ as a Cauchy product $[x^{n-k}]\sum_{l=0}^k\binom{k}{l}\left(-\frac{1}{4}\right)^{l}x^l$. This is the core of the derivation since it allows a drastic simplification in the next step.
In (5) we can write the polynomial in $x$ as $k$-th power of a binom.
In (6) we obtain a geometric series representation which gives us in the next line a representation as rational function in $x$.
Note: In fact, there is a powerful method to solve problems of this kind, called the Goulden-Jackson Cluster Method. In this context we regard $ba$ as bad word and derive immediately (see formula of $f(s)$ on page $9$) the rational expression (7) of $L(x)$.