I need to find a regular expression for the following language:
$$ \Sigma = {\{a,b,c}\} $$
define $L$ to be the language of all words over $\Sigma$ that contain the substring $aba$ odd number of times.
Any help is welcome, I would also like a tip of how do I even start question like this.
Thanks!
I tried to start like this:
We need atleast 1 $aba$, before him, we can have any expression that does not contain $aba$ and does not finish with $ba$, If I call this expression $R$, the solution should be something like:
$$R^*aba[R^*abaR^*aba]^*R^*$$
is this correct? if so, I cant think of this $R$.
Revised.
Your idea almost works.
The symbol $b$ is the key to writing a regular expression that does not contain $aba$: $b$ can only appear initially, finally, and in the environments $bb$, $abc$, $cba$, and $cbc$. That is, each $b$ must be the first symbol of the word, the last symbol of the word, part of a block $abc$, or immediately preceded by a $b$ or a $c$. Let’s ignore the first two possibilities for a moment. Then each of these blocks can be preceded by any string of $a$s and $c$s, and the last one can be followed by any string of $a$s and $c$s. Thus, still ignoring the possibility of a single initial or final $b$, we have
$$(a+c+bbb^*+bc+cb)^*\;.$$
(I use $bbb^*$ instead of just $bb$ in order to get strings of odd numbers of $b$s.) Call this regular expression $R_0$. It can be preceded by $b$ or followed by $ab$, so
$$R_1=(b+\lambda)R_0(ab+\lambda)$$
covers all possibilities: anything else that ends in $b$ is already covered by $$(b+\lambda)(a+c+bbb^*+bc+cb)^*\;.$$ (I use $\lambda$ for the empty word.)
To get a word that contains $aba$ an odd number of times, we can certainly start with $R_1$, any word that contains $aba$ zero times, and this certainly has to be followed by $\color{brown}{aba}$. Now an $R_1$ word could end in $ab$, but this isn’t actually a problem if $ab\color{brown}{aba}$ counts as only one copy of $aba$. However, we can’t simply follow this by another $R_1$ word, because it might start with $ba$, and in that case we could get $ab\color{brown}{aba}ba$, with two copies of $aba$. To avoid this, let
$$R=ababaR_0+aba(b+\lambda)R_0\;,$$
and let’s try
$$(b+\lambda)R_0R(RR)^*\;.$$
Each $R$ generates at least one $aba$, and since $R_0$ cannot generate anything beginning $ba$ or $aba$, each $R$ generates only one $aba$. The first term of $R$ allows for $R_1$ strings that end in $ab$.