Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular?

2.4k Views Asked by At

Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular?

I believe the answer is NO, but it is hard to make a formal demonstration of it, even a NON formal demonstration :P.

Any ideas on how to approach this?

This language complies with the pumping lemma. I have a formal demonstration of that, so you have to find other ways to prove it not regular. The main idea of the demonstration is that if you have string that belongs to the language, you can split it in the first character and the rest of the string. By pumping in that character you will always get a string with the same number of ab & ba, it doesn't matter if the first character is a a, b or c. If you have any other ideas, please let me know. Thanks and regards!!, Alex.

3

There are 3 best solutions below

0
On

Suppose that the language is recognized by an automaton having $n$ states. Consider the $n+1$ words $(abc)^k$ where $k$ ranges from 0 to $n$. By the pigeonhole principle, there are two different words $(abc)^i$ and $(abc)^j$ which produce the same state when read by the automaton.

Since the word $(abc)^i (bac)^i$ is recognized, $(abc)^j (bac)^i$ must also be recognized, which is a contradiction.

1
On

Let $L$ be the language described in the question. Consider $L' = L \cap \{ (abc)^*ccc(bac)^* \} $. Thus $L' = \{ (abc)^nccc(bac)^n | \; n \in \mathbb{Z} \}$. You can apply pumping lemma straight to $L'$ or use Martin Sleziak's observation to replace $abc$ by $g$ and $bac$ by $h$ to give you $L'' = \{ g^nccch^n | \; n \in \mathbb{Z} \}$ which is obviously not regular. Since intersect and homomorphism preserve regularity, this means the original $L$ was also not regular.

1
On

I believe the simplest way of proving this language is not regular is using Myhill–Nerode theorem since it allows you to focus on an "easy" set of words.

Let's define $w_{n} = (abc)^nc$, i.e. $w_n$ has exactly $n$ occurrences of "ab" and no occurrences of "ba" (since after b comes c). Now it's very easy to show that for every pair of words $w_n, w_m$ with $n\ne m$ we have the separating word $z=(bac)^n$ (so that $w_nz$ is in the language but $w_mz$ is not).