CFG where the number of 0s is the same as the number of 1s and there is exactly one 2.

2.6k Views Asked by At

So to create a CFG where the number of 0s and 1s are the same, I have:

$$ S \rightarrow SS \ | \ 0S1 \ | \ 1S0 \ | \ \epsilon $$

However, I don't know how to inject one single 2 to create the CFG that the number of 0s is the same as the number of 1s and there is exactly one 2. Any help is appreciated! Thanks!

2

There are 2 best solutions below

0
On

$$S \rightarrow ST \mid TS \mid2T\mid 0S1 \mid 1S0 \\T\rightarrow TT\mid0T1\mid 1T0\mid\epsilon$$

0
On

Here's one example grammar: $$\begin{align*}S &\mapsto \mathtt{2} | \mathtt{0}S\mathtt{1} | \mathtt{1}S\mathtt{0} | ST | TS \\ T &\mapsto TT | \mathtt{0}T\mathtt{1} | \mathtt{1}T\mathtt{0} | \epsilon\\ \end{align*}$$

It turns out that, just as there's a simple but counterintuitive grammar for "strings with an equal number of zeroes and ones", there is a simple but counterintuitive grammar for strings with an equal number of zeroes and ones, and exactly one 2.


Here's one approach: you might realize that you can modify the grammar for the language "strings with equal numbers of zeroes and ones" by allowing just one of those productions to put in a two somewhere.

Starting from this grammar for your original language: $$\begin{align*}S &\mapsto SS | \mathtt{0}S\mathtt{1} | \mathtt{1}S\mathtt{0}|\epsilon\\\end{align*}$$

How about this grammar for your new language: $$\begin{align*}S &\mapsto \mathtt{2} | \mathtt{0}S\mathtt{1} | \mathtt{1}S\mathtt{0} | \mathtt{20}T\mathtt{1} | \mathtt{02}T\mathtt{1} | \mathtt{0}T\mathtt{21} | \mathtt{0}T\mathtt{12} | ST | TS\\ T &\mapsto TT | \mathtt{0}T\mathtt{1} | \mathtt{1}T\mathtt{0}|\epsilon\\ \end{align*}$$

Here, $T$ behaves exactly like our old start symbol used to. $S$ behaves similarly, except $S$ means "You must have exactly one 2 in here somewhere". So we had to modify the old production $S\mapsto SS$ to either $S\mapsto ST| TS$.

This grammar suffices, but you can simplify it if you introduce a production like $S \mapsto \mathtt{2}T | T\mathtt{2}$ which allows some of the other productions to simplify. Then you get the grammar showed above.


If you don't have a flash of intuition, you can also construct this grammar in a more mechanical way, too. It's useful to see what that process looks like:

  1. You can confirm that your language $L$ is context-free because it's the intersection of a context-free language ($L_A$, the language of strings with equal numbers of 0 and 1, with any number of 2s interspersed) with a regular language ($L_B$, the language of strings with exactly one 2 in it). Therefore, you can be confident that there is a context-free grammar for it.

$$L = L_A \cap L_B$$

  1. One proof that context-free languages are closed under intersection with regular languages involves building a pushdown automaton that keeps track of the states of both machines and accepts if both accept. So one approach to building a grammar for $L$ is to build machines for $L_A$ and $L_B$ and follow the recipe for forming the intersection machine.

    This turns out to be tedious, but by following the recipe, you can suddenly see how to make a grammar for $L$.

  2. First, you build a grammar for $L_A$ (strings with equal numbers of 0 and 1, with any number of 2s interspersed). Here's an example:

    $$\begin{align*}S &\mapsto SS | TS | ST | T | \mathtt{0}S\mathtt{1} | \mathtt{1}S\mathtt{0}\\ T &\mapsto \mathtt{2}T | \epsilon\end{align*}$$

    This is basically your grammar, with a new symbol $T$ that means "you can put any number of twos here".

  3. You can convert this grammar into Greibach Normal Form. In this form, every production must consist of a terminal symbol followed by zero or more nonterminal symbols. This form will make it easier to see which rules must change in order to make the grammar for $L_A$ into a grammar for $L_A\cap L_B = L$. There's a recipe to follow to convert a grammar into gnf— but here, I just reasoned about what the first nonterminal symbol might be in each case.

    Here is a grammar for $L_A$ in Greibach normal form: $$\begin{align*}S &\mapsto \mathtt{2} | \mathtt{2}S | \mathtt{1}SO | \mathtt{0}SI | \mathtt{2}SS | \mathtt{0}SIS | \mathtt{1}SOS\\ O &\mapsto \mathtt{0} | \mathtt{2}O | \mathtt{2}OX | \mathtt{0}X\\ I &\mapsto \mathtt{1} | \mathtt{2}I | \mathtt{2}IX | \mathtt{1}X\\ X &\mapsto \mathtt{2} | \mathtt{2}X \end{align*}$$ You can interpret the symbols as:

    • $S$ = a string of equal numbers of zeroes and ones, interspersed with any number of 2s,
    • $I$ = a string consisting of a single 1 and any number of 2s,
    • $O$ = a string consisting of a single 0 and any number of 2s,
    • $X$ = a string consisting of one or more 2s.
  4. Now comes the interesting part. Because every rule in Greibach normal form reads a character first, it becomes possible to see how to modify this grammar for $L_A$ into a grammar for $L_A\cap L_B = L$.

    We'll just need two sets of symbols— one set of symbols will be allowed to have a single 2 inside them, and another set of symbols will not be allowed to have any 2s inside them. So we'll have a grammar with a symbols $S$ and $S^\prime$, $I$ and $I^\prime$, etc. The unprimed symbols will mean "...and the single 2 in this string occurs here". The primed symbols will mean "...and there are no 2s in this string".

    We can modify the rules of our grammar— by adding appropriate prime symbols and deleting inappropriate productions — to finally make a grammar for $L = L_A \cap L_B$. You can figure out what the rules must be based on the interpretation of the symbols written above.

    $$\begin{align*}S &\mapsto \mathtt{2} | \mathtt{2}S^\prime | \mathtt{1}SO^\prime | \mathtt{1}{S^\prime}O | \mathtt{0}SI^\prime | \mathtt{0}S^\prime I | \mathtt{2}{S^\prime}S^\prime | \mathtt{0}S{I^\prime}{S^\prime} | \mathtt{0}{S^\prime}I{S^\prime} | \mathtt{0}{S^\prime}{I^\prime}S | \mathtt{1}S{O^\prime}{S^\prime} | \mathtt{1}{S^\prime}O{S^\prime} | \mathtt{1}{S^\prime}{O^\prime}S \\ O &\mapsto \mathtt{02} | \mathtt{20}\\ I &\mapsto \mathtt{12} | \mathtt{21}\\ &\\ S^\prime &\mapsto {S^\prime}S^\prime | \mathtt{0}S^\prime\mathtt{1} | \mathtt{1}S^\prime\mathtt{0}\\ O^\prime &\mapsto \mathtt{0} \\ I^\prime &\mapsto \mathtt{1} \end{align*}$$

  5. Clean it up and simplify, and you get: $$\begin{align*}S &\mapsto \mathtt{2} | \mathtt{0}S\mathtt{1} | \mathtt{1}S\mathtt{0} | ST | TS \\ T &\mapsto TT | \mathtt{0}T\mathtt{1} | \mathtt{1}T\mathtt{0} | \epsilon\\ \end{align*}$$