Finding the equivalent regex for a formal grammar

45 Views Asked by At

We have the following formal grammar:

$a, b$ are terminal symbols.

$S, A, B$ are non-terminal symbols.

$S$ is the startsymbol.

Thinking in terms of a nondeterministic finite automata $q0$ indicates the initial state.

$f$ is the final state symbol.

Production rules:

$$S\to aA$$

$$A\to \varepsilon|bS|bA|aB$$

$$B\to b|aS$$

Equations connecting the states are:

(1)$$S = q_0 \varepsilon + Ab + Ba$$

(2)$$A = Sa + Ab$$

(3)$$B =Aa$$

(4)$$f = Aa + Bb$$

Utilizing Arden's theorem on (2) we obtain:

(5)$$A = Sa + Ab = Sab^{*}$$

Continuing with (4) and exploiting (3) we have:

(6)$$f = Aa + Bb = Aa + A(ab) = A(a+ab) = Sab^{*}(a+ab)$$

Then write $S$ in terms of the initial state $q_0$ and the terminals:

(7)$$S = q_0 \varepsilon + Ab + Ba = q_0 \varepsilon + Ab + A(ab) = q_0 \varepsilon + A(b+ab)$$

We again use Arden's theorem:

(8)$$q_0\varepsilon + Sab^{*}(b+ab) = q_0\varepsilon [ab^{*}(b+ab)]^{*}$$

So finally we arrive at:

(9)$$f = q_0\varepsilon [ab^{*}(b+ab)]^{*}ab^{*}(a+ab).$$

Is my flow of thought correct?

1

There are 1 best solutions below

0
On BEST ANSWER

I see one bug, which is that the original grammar accepts the string $\mathtt{a}$, while your regular expression doesn't seem to accept it. Not sure where the error is, as I'm not sure how to interpret equations (1)-(4).

If I were tackling this problem, I would actually start by simplifying the grammar. Beginning with the initial definition: \begin{align*} S&\to aA\\ A &\to \varepsilon \mid bS \mid bA \mid aB \\ B &\to b \mid aS \end{align*}

I would first eliminate $B$. Specifically, I would use the production $B\to b\mid aS$ to expand the production $A\to aB$ into $ A \to ab \mid aaS$. Now $B$ never occurs on the right hand side, so we can delete that production.

\begin{align*} S&\to aA\\ A &\to \varepsilon \mid bS \mid bA \mid ab \mid aaS \end{align*}

Next, I would eliminate $A\rightarrow \varepsilon$. The way to do this is to replace $S\rightarrow aA$ with $S\rightarrow aA \mid a$, and replace $A\rightarrow bA$ with $A\rightarrow bA \mid b$. Now we've made the production $A\rightarrow \varepsilon$ obsolete. We can eliminate it:

\begin{align*} S&\to aA \mid a \\ A &\to bS \mid bA \mid b \mid ab \mid aaS \end{align*}

I would replace the recursive use of the start symbol $S$. Instead of $A\to aaS$, I would expand it to $A \to aaaA \mid aaa$. Instead of $A\to bS$, I would expand it to $A \to ba \mid baA$.

\begin{align*} S&\to aA \mid a \\ A &\to ba \mid baA \mid bA \mid b \mid ab \mid aaaA \mid aaa \end{align*}

Let's sort these productions to put simpler ones first:

\begin{align*} S&\to a \mid aA \\ A &\to b \mid ab \mid ba \mid aaa \mid bA \mid baA \mid aaaA \end{align*}

Now we can represent this as a DFA with three states: $S$, $A$, and $F$ (accepting). The transitions are:

\begin{align*}S &\xrightarrow{a} F\\ S &\xrightarrow{a} A\\ A &\xrightarrow{b\mid ab \mid ba \mid aaa} F\\ A &\xrightarrow{b \mid ba \mid aaa} A\\ \end{align*}

We convert it into a regular expression by having every transition from $S$ be the start of a regular expression, and every transition into $F$ be the end of a regular expression. Parallel transitions combine with $\mid$, series transitions combine with concatenation, and self-transitions use Kleene star. Altogether, we have:

$$a \mid a(b\mid ba\mid aaa)^*(b\mid ab \mid ba\mid aaa)$$