How to verify that ambiguous grammar generates same language as unambigious grammar?

187 Views Asked by At

Consider the grammar $S \rightarrow 0S|0S1S|\epsilon$. Clearly this grammar is ambiguous for a string like $001$ because there are $2$ derivation trees for it. The $1$ can either be matched with either $0$. Now if we want the $1$ to be matched with nearest $1$ then the binding strength of the second rule in the grammar namely $OS1S$ should be stronger and to do this in the derivation tree it should have the lower position.

Now we can do this by describing our unambigious grammar as $S \rightarrow 0S|TS|\epsilon$ and $T \rightarrow 0T1T|\epsilon$ Now is there a way to prove that they generate the same language?

1

There are 1 best solutions below

1
On

Note that your second grammar is still ambiguous: 0101 has two different deviations. You'll probably want to have the production $S\to \mathtt 0T\mathtt 1S$ instead of $S\to TS$.

As for proving the equivalence, you have two basic strategies to choose from: a semantic one and a syntactic one.

Semantically you would prove separately that each of the grammars generates exactly the language $$ L= \{w\in\{\mathtt 0,\mathtt 1\}^*\mid \text{every prefix of $w$ has at least as many $\tt 0$s as $\tt 1$s} \} $$ For each grammar you can prove that everything that's generated is in $L$ by induction on the deviation tree. Then you need to prove that everything in $L$ can be generated, which is also a relatively easy induction on the length of $w$.

Syntactically you would prove that you can rewrite any derivation tree according to one grammar into a derivation tree according to the other one, and vice versa. This can be tricky, but looks like it should be doable without too many tears in this case.

You can also choose a combination approach where you argue semantically for $\mathcal L(G_1)\subseteq \mathcal L(G_2)$ and syntactically for $\mathcal L(G_2)\subseteq \mathcal L(G_1)$ or vice versa.

If you take my suggestion in the first paragraph above, this could be an attractive option, because each tree according to the second grammar is clearly also valid according to the first, just by forgetting the distinction between the two nonterminals. And doing the other directions semantically relieves you of needing to be clever about rewrites.