Which of the following string have two or more parse trees?

860 Views Asked by At

Consider the following ambiguous grammar:

$S→A|BC$

$A→aAC|B$

$C→bCc|c$

$B→aBb|\in$

Which of the following string have two or more parse trees?

  1. $aaabbbbbcc$
  2. $aaabb$
  3. $aabb$
  4. None of these

My attempt:

  1. $aaabbbbbcc\notin L(G)$
  2. $aaabb\notin L(G)$
  3. $S=A=B=aBb=aaBbb=aabb$ in only one way.

Can you explain $L(G)$ and string in options have two or more parse tree, please?

1

There are 1 best solutions below

1
On BEST ANSWER

I agree with your answer: the first two words aren’t in $L(G)$, and the third has a unique derivation (and parse tree). Thus, the correct answer is None of these.

The grammar is simple enough that we can identify the language. There are only two possibilities for the first step in a derivation. Suppose that we begin by applying the production $S\to BC$. We can apply $B\to aBb$ $k$ times for any $k\ge 0$, but eventually we must apply $B\to\epsilon$. Similarly, we can apply $C\to bCc$ $\ell$ times for any $\ell\ge 0$, but eventually we must apply $C\to c$. If we apply the productions in this order, the derivation is

$$S\Rightarrow BC\Rightarrow^k a^kBb^kC\Rightarrow a^kb^kC\Rightarrow^\ell a^kb^{k+\ell}Cc^\ell\Rightarrow a^kb^{k+\ell}c^{\ell+1}\;.$$

Changing the order of the applications of $B\to aBb$ and $C\to bCc$ does not affect the final result. This part of the language is easy to describe:

$$L_1=\left\{a^kb^\ell c^m:\ell=k+m-1\text{ and }m\ge 1\right\}\;.$$

Now suppose that we begin by applying $S\to A$. We can then apply $A\to aAC$ $k$ times for any $k\ge 0$, but the only way to get rid of the $A$ is eventually to apply $A\to B$; at this point we have

$$S\Rightarrow A\Rightarrow^k a^kAC^k\Rightarrow a^kBC^k\;.$$

We can apply $B\to aBb$ $\ell$ times for any $\ell\ge 0$ but must eventually apply $B\to\epsilon$:

$$S\Rightarrow A\Rightarrow^k a^kAC^k\Rightarrow a^kBC^k\Rightarrow^\ell a^{k+\ell}Bb^\ell C^k\Rightarrow a^{k+\ell}b^\ell C^k\;.$$

Each $C$ must eventually generate a word of the form $b^mc^{m+1}$, so in the end we get

$$a^{k+\ell}b^\ell b^{m_1}c^{m_1+1}\ldots b^{m_k}c^{m_k+1}\;,$$

where $k,\ell,m_1,\ldots,m_k\ge 0$. Note that here again the outcome does not depend on the order of the productions, though there can be many possible orders. This part of the language is

$$L_2=\left\{a^{k+\ell}b^\ell b^{m_1}c^{m_1+1}\ldots b^{m_k}c^{m_k+1}:k,\ell,m_1,\ldots,m_k\ge 0\right\}\;.$$

Now consider the word $a^3b^5c^2$. It’s not the case that $5=3+2-1$, so this word is not in $L_1$. Suppose that it’s in $L_2$. It has only one block of $c$s, so $k=1$; that block is of length $2$, so $m_1=1$. Thus, the word must be $a^{1+\ell}b^{\ell+1}c^2$ for some $\ell\ge 0$, which is plainly not the case, since $3\ne 5$. This shows that $a^3b^5c^2\notin L$.

Examination of $L_1$ and $L_2$ shows that only $L_2$ contains words without $c$, and then only when $k=0$. But the words in $L_2$ with $k=0$ are those of the form $a^\ell b^\ell$ for $\ell\ge 0$, so $a^3b^2\notin L$.

On the other hand, $a^2b^2$ is in $L_2$ but not in $L_1$, so it can be derived only via the first step $S\Rightarrow A$. Moreover, $k$ must be $0$, so the next step must be $A\Rightarrow B$. Finally, the production $B\to aBb$ must be applied exactly twice, followed by $B\to\epsilon$. At each stage there is exactly one non-terminal symbol, so no other derivation is possible. That is, not only does $a^2b^2$ have only one leftmost derivation, it has only one derivation of any kind.

The analysis used above to identify and describe $L_1$ and $L_2$ actually shows that each word of $L$ has a unique leftmost derivation: for both $L_1$ and $L_2$ I actually wrote down the unique leftmost derivations, though I did not go into detail for

$$C^k\Rightarrow^{m_1+1}b^{m_1}c^{m_1+1}C^{k-1}\Rightarrow^{m_2+1}\ldots\Rightarrow^{m_k+1}b^{m_1}c^{m_1+1}\ldots b^{m_k}c^{m_k+1}\;.$$

Thus, $G$ is not in fact an ambiguous grammar.