Context free grammar for a specific context-free language

800 Views Asked by At

Problem $L = \{ w \in \{a, b\}^* : n(a) \neq 2 \cdot n(b)\}$

This can be done easily with NPDA, but I couldn't find a way to make it work with CFG. My idea was to break it into 2 cases: $n(a) > 2 \cdot n(b)$ or $n(a) < 2 \cdot n(b)$. I first try to generate a language which makes them equal, then add more $a$ or more $b$ but it couldn't handle many special cases. Here is what I got: $$ S \rightarrow U | V $$ $$ U \rightarrow TaU | TaT$$ $$ V \rightarrow TbV | TbT$$ $$ T \rightarrow TaTaTbT | TaTbTaT | TbTaTaT$$

$U$ generates $n(a) > 2 \cdot n(b)$, $V$ generates $n(a) < 2 \cdot n(b)$, and $T$ takes care of the equal case. Any idea would be greatly appreciated.

Update Following Hint by Brian
enter image description here

2

There are 2 best solutions below

7
On BEST ANSWER

Here’s a less general but more direct approach than mercio’s.

Your basic idea is fine, but as it stands, your grammar does not generate any terminal words: at the very least you need to expand that last production to

$$T\to TaTaTbT \mid TaTbTaT \mid TbTaTaT\mid\varepsilon\;.$$

Let $L'$ be the set of strings with exactly twice as many $a$’s as $b$’s. Think about the kind of string that should be generated by $U$, the ones with too many $a$’s. Every $b$ in such a word can be incorporated into a member of $L'$. Adjacent members of $L'$ can be concatenated into longer members of $L'$. If there’s more than one of these maximal members of $L'$, they must be separated by non-empty strings of $a$’s. There might also be strings of $a$’s at the beginning and end. Thus, the words generated by $U$ must have the form $a^*(L'a^+)^+L'a^*$. It would be convenient, therefore, to have a non-terminal $A$ that generates $a^*$, and another, $U_1$, that generates $(L'a^+)^+$. We might try something like this:

$$\begin{align*} &A\to Aa\mid \varepsilon\\ &U\to AU_1TA\\ &U_1\to U_1TaA\mid TaA \end{align*}$$

Certainly $A$ generates $a^*$, and $U_1$ generates $(L'a^+)^+$, so $U$ generates $a^*(L'a^+)^+L'a^*$. This may not be the most efficient solution, but it appears to work, and the same idea can be used to handle $V$.

2
On

First, the following grammar produces the language $L_0$ containg words where $n(b) \neq n(a)$ $$S \rightarrow U \;|\; V $$ $$U \rightarrow TbT \;|\; TbU$$ $$V \rightarrow TaT \;|\; TaV$$ $$T \rightarrow bTaT \;|\; aTbT \;|\; \epsilon $$ It follows your idea of making two cases, and $T$ is the one giving words with equality and is the only non-obvious (but classic) one.

We have a map from $L$ to $L_0$ by transforming every $b$ into an $bb$. Let's try to pull this grammar back through this map. In order to do this, we need a grammar for the image of this map :

We introduce new symbols $U_b, V_b, T_b$ where $b U_b$ is intended to decompose into the words that $U$ can decompose into but beginning with an $b$. We do the same and have new symbols $U^b, V^b, T^b$ where $U^b b$ decomposes into the words $U$ can decompose into but ending with an $b$.

We get (you have to be careful when a $T$ appears because it can be empty) : $$ S \rightarrow U \;|\; V $$ $$ U \rightarrow (T^b b)bT \;|\; Tb(bT_b) \;|\; (T^b b)bU \;|\; Tb(bU_b) $$ $$ U_b \rightarrow (T_b^b b)bT \;|\; T_b b(bT_b) \;|\; T \;|\; (T_b^b b)bU \;|\; T_bb(bU_b) \;|\; U $$ $$ V \rightarrow TaT \;|\; TaV $$ $$ T \rightarrow b(bT_b)aT \;|\; a(T^bb)bT \;|\; aTb(bT_b) \;|\; \epsilon$$ $$ T_b \rightarrow TaT$$ $$ T^b \rightarrow b(bT_b)aT^b \;|\; a(T^bb)bT^b \;|\; aTb(bT_b^b) \;|\; aT$$ $$ T_b^b \rightarrow TaT^b$$ which describes the words of $L_0$ where every $b$ appears in a pair $bb$.

Now we can pull back and simplify the two symbols having only one rule and get this :

$$ S \rightarrow U \;|\; V $$ $$ U \rightarrow T^b bT \;|\; TbTaT \;|\; T^b bU \;|\; TbU_b $$ $$ U_b \rightarrow TaT^b bT \;|\; TaTbTaT \;|\; T \;|\; TaT^b bU \;|\; TaTbU_b \;|\; U $$ $$ V \rightarrow TaT \;|\; TaV $$ $$ T \rightarrow bTaTaT \;|\; aT^bbT \;|\; aTbTaT \;|\; \epsilon$$ $$ T^b \rightarrow bTaTaT^b \;|\; aT^bbT^b \;|\; aTbTaT^b \;|\; aT$$

Which should work, if I didn't mess up. Note that this method can be generalized to pulling back any grammar through any such map.