Find regular expression for a given language

663 Views Asked by At

I need to find a regular expression for the following language:
$$ \Sigma = {\{a,b,c}\} $$
define $L$ to be the language of all words over $\Sigma$ that contain the substring $aba$ odd number of times.

Any help is welcome, I would also like a tip of how do I even start question like this.
Thanks!

I tried to start like this:
We need atleast 1 $aba$, before him, we can have any expression that does not contain $aba$ and does not finish with $ba$, If I call this expression $R$, the solution should be something like:
$$R^*aba[R^*abaR^*aba]^*R^*$$

is this correct? if so, I cant think of this $R$.

1

There are 1 best solutions below

7
On

Revised.

Your idea almost works.

The symbol $b$ is the key to writing a regular expression that does not contain $aba$: $b$ can only appear initially, finally, and in the environments $bb$, $abc$, $cba$, and $cbc$. That is, each $b$ must be the first symbol of the word, the last symbol of the word, part of a block $abc$, or immediately preceded by a $b$ or a $c$. Let’s ignore the first two possibilities for a moment. Then each of these blocks can be preceded by any string of $a$s and $c$s, and the last one can be followed by any string of $a$s and $c$s. Thus, still ignoring the possibility of a single initial or final $b$, we have

$$(a+c+bbb^*+bc+cb)^*\;.$$

(I use $bbb^*$ instead of just $bb$ in order to get strings of odd numbers of $b$s.) Call this regular expression $R_0$. It can be preceded by $b$ or followed by $ab$, so

$$R_1=(b+\lambda)R_0(ab+\lambda)$$

covers all possibilities: anything else that ends in $b$ is already covered by $$(b+\lambda)(a+c+bbb^*+bc+cb)^*\;.$$ (I use $\lambda$ for the empty word.)

To get a word that contains $aba$ an odd number of times, we can certainly start with $R_1$, any word that contains $aba$ zero times, and this certainly has to be followed by $\color{brown}{aba}$. Now an $R_1$ word could end in $ab$, but this isn’t actually a problem if $ab\color{brown}{aba}$ counts as only one copy of $aba$. However, we can’t simply follow this by another $R_1$ word, because it might start with $ba$, and in that case we could get $ab\color{brown}{aba}ba$, with two copies of $aba$. To avoid this, let

$$R=ababaR_0+aba(b+\lambda)R_0\;,$$

and let’s try

$$(b+\lambda)R_0R(RR)^*\;.$$

Each $R$ generates at least one $aba$, and since $R_0$ cannot generate anything beginning $ba$ or $aba$, each $R$ generates only one $aba$. The first term of $R$ allows for $R_1$ strings that end in $ab$.