The following is an exercise in a book I am reading:
Let $\Sigma=\{a,b,c\}$, define $L$ to be the language of all words over $\Sigma$ that do not contain $ab$ as a sub-word. Find a regular expression for $L$.
I am unable to solve this problem, I would like to know how to tackle this type of question, what is the solution and what is the thought process to get to it
One easy if tedious approach is to design a finite-state automaton that recognizes $L$ and then apply the algorithm that converts such an automaton to a corresponding regular expression.
Going about it directly, I observe that if $w\in L$, every $a$ in $w$ must be immediately followed by another $a$ or a $c$ or be at the end of the word; otherwise there are no restrictions. Similarly, every $b$ must be immediately preceded by another $b$ or a $c$ or be at the beginning of the word. Assume for the moment that $w$ does not begin with $b$ or end with $a$. If $w$ begins with $a$, it must begin $aa^*c(b\mid c)^*$. This pattern may be repeated any number of times: $\big(aa^*c(b\mid c)^*\big)^*$. To allow it to end with $a$, just tack on $a^*$: $\big(aa^*c(b\mid c)^*\big)^*a^*$. To allow it to begin with $b$, prefix $b^*$: $b^*\big(aa^*c(b\mid c)^*\big)^*a^*$. In fact, a little thought reveals that we can instead prefix $(b\mid c)^*$ to cover all cases:
$$(b\mid c)^*\big(aa^*c(b\mid c)^*\big)^*a^*\tag{1}$$
Added: Gerry Myerson’s approach in the comments is more elegant, though I’d carry it out a little differently. $\Sigma^*=(a\mid b\mid c)^*$, and we want to keep all of it that doesn’t have an $a$ followed immediately by a $b$. Thus, except at the end of a word we want to replace the selection $a$ by one of $ac,aac,aaac$, etc. If we allowed infinite disjunctions, this would give us $$(b\mid c\mid ac\mid aac\mid aaac\mid\dots)^*a^*\tag{2}$$ for $L$, where the last $a^*$ is to allow the word to end with $a$. We don’t allow such expressions, but $ac\mid aac\mid aaac\mid\dots$ can be written legitimately as $aa^*c$, and $(2)$ can then be written legitimately as $$(aa^*c\mid b\mid c)^*a^*\;.\tag{3}$$
It’s not hard to see that the languages described by $(1)$ and $(3)$ are subsets of $L$: neither regular expression permits $ab$. To see that these languages include all of $L$, note first $\lambda$, the empty word, is in both. Now let $w=x_1^{n_1}\dots x_m^{n_m}\in L$ be non-empty, where $x_k\in\Sigma$ and $n_k\in\Bbb Z^+$ for $k=1,\dots,m$, and $x_k\ne x_{k+1}$ for $k=1,\dots,m-1$. Assume further that $w$ is minimal in length among all words of $L$ not matching one of the regular expressions $(1)$ and $(3)$.
If $a$ does not occur in $w$, $w$ clearly matches both $(1)$ and $(3)$, so let $i$ be the first index such that $x_i=a$, and let $u=a^{n_i}x_{i+1}^{n_{i+1}}\dots x_m^{n_m}$. To show that $w$ matches $(1)$, it suffices to show that $u$ matches $\big(aa^*c(b\mid c)^*\big)^*a^*$; to show that $w$ matches $(3)$, it suffices to show that $u$ matches $(3)$. Both of these are clear if $u=a^{n_i}$, since in that case $u$ matches the final $a^*$ of each regular expression. Otherwise, $x_{i+1}=c$, and $a^{n_i}c$ matches $aa^*c$ in both regular expressions. Now let $v$ be what’s left of $u$ after the initial $a^{n_i}c$ is removed. Then $w$ matches $(1)$ iff $v$ matches $(1)$, and $w$ matches $(3)$ iff $v$ matches $(3)$. But $v\in L$, and $|v|<|w|$, so by hypothesis $v$ matches both $(1)$ and $(3)$, and hence so does $w$. Thus, each of these regular expressions represents $L$.