I am working on a question where I have the formal language Z over the alphabet Q {a, b, c} and it is generated by the context-free grammar whose non-terminals are S, A, and B, the start symbol is S, production rules are as follows:
(1) S → abSb
(2) S → A
(3) A → Bc
(4) B → cA
(5) A → a
The question asked to describe the structure of strings in the language and I came up with $Z = \{w ∈ Q^* | w = ab^n o u o b^m, u ∈ Q^*, n, m ∈ N^*\}$. ($o$ means concatenated here).
I am now struggling with the second part of the question where I am asked to show this language isn't regular using the pumping lemma. I understand the concept of the pumping lemma and I have applied to simpler questions but I have never used it for a language involving concatenation.
Most examples I have seen online have been of structure $L = \{0^n 1 0^n 1 | n ∈ N^*\}$. I am wondering if there is some rule I am missing that says the powers must be the same or if I have gone wrong in my definition of the language.
Thanks in advance :)
Your suggested expression, $$Z = \{w ∈ Q^* \mid w = ab^n Q^* b^m\},$$
is wrong. As you suspected, the language is irregular only if there is something forcing the number of $\mathtt b$s to be the same. In this case, there is. If the language were actually $\mathtt{ab}^nQ^*\mathtt{b}^m$, it would be regular.
Hint: Note that the only way to eliminate $B$ from a term is to replace it with ${\mathtt c}A$. Let's produce an equivalent grammar that doesn't use $B$. To do this, we drop the rule $B \to {\mathtt c}A$ completely, and then instead of $A\to B{\mathtt c}$, have $A\to {\mathtt c}Ac$. (Do you understand why this works?)
Now the grammar is $$\begin{align} S&\to \mathtt{ab}S\mathtt{b}\\ S&\to A \\ A&\to \mathtt{c}A\mathtt{c}\\ A&\to \mathtt{a} \end{align} $$
Perhaps this will be easier to handle?