Designing a context free grammar

140 Views Asked by At

I have to design a grammar over the alphabet $\sum=(a,b)$, so that $c^a(\alpha)=c^b(\alpha)$ and the second part $c^a(\alpha)\leq c^b(\alpha)$ , where $\alpha$ is a word and $c^a$ and $c^b $ are respectively, the number of $a$'s in the word and the number of $b$'s.For the first part I came up with : $$S_0\rightarrow S; S\rightarrow SaSbS| SbSaS|\epsilon $$and for the second one I came up with:$$S_0\rightarrow S ;S\rightarrow SaSbS|SbSaS|b|\epsilon$$ My problem is that I have no idea how to logicaly prove that this is correct. I can't just say "well, it works". So I want to know how to prove this, and basically what action to take when I'm faced with problems of this type.

1

There are 1 best solutions below

3
On BEST ANSWER

If you want to prove that your grammar

$$ S \to S\mathtt{a}S\mathtt{b}S \mid S\mathtt{b}S\mathtt{a}S \mid \varepsilon$$

generates $$L = \Big\{ \alpha \in \{\mathtt{a},\mathtt{b}\}^* \ \Big|\ c^\mathtt{a}(\alpha) = c^\mathtt{b}(\alpha)\Big\},$$

then you need to do two things:

  • prove that each word $S$ generates has equal number of $\mathtt{a}$'s and $\mathtt{b}$'s,
  • prove that each word that has equal number of $\mathtt{a}$'s and $\mathtt{b}$'s is generated by your grammar.

The first bullet point can be done using induction, e.g. on the number of applications of the three rules you have (there are three alternatives).

Base:

If $\alpha$ was generated with one application of $S$, then $\alpha = \varepsilon$, but $c^\mathtt{a}(\alpha) = 0 = c^\mathtt{b}(\alpha)$.

Step:

Suppose that $\alpha$ was generated with $n$ applications of $S$ and we have that any word generated with strictly smaller number has equal numbers of $\mathtt{a}$'s and $\mathtt{b}$'s.

We have two cases, $\alpha = \alpha_1\mathtt{a}\alpha_2\mathtt{b}\alpha_3$ or $\alpha = \alpha_1\mathtt{b}\alpha_2\mathtt{a}\alpha_3$. Either way, by inductive hypothesis we have $c^\mathtt{a}(\alpha_i) = c^\mathtt{b}(\alpha_i)$ and combining this together we get \begin{align} c^\mathtt{a}(\alpha) &= c^\mathtt{a}(\alpha_1) + c^\mathtt{a}(\mathtt{a}) + c^\mathtt{a}(\alpha_2) + c^\mathtt{a}(\mathtt{b}) + c^\mathtt{a}(\alpha_3) \\ &= 1 + c^\mathtt{a}(\alpha_1) + c^\mathtt{a}(\alpha_2) + c^\mathtt{a}(\alpha_3) \\ &= 1 + c^\mathtt{b}(\alpha_1) + c^\mathtt{b}(\alpha_2) + c^\mathtt{b}(\alpha_3) \\&= c^\mathtt{b}(\alpha_1) + c^\mathtt{b}(\mathtt{a}) + c^\mathtt{b}(\alpha_2) + c^\mathtt{b}(\mathtt{b}) + c^\mathtt{b}(\alpha_3) = c^\mathtt{b}(\alpha) \end{align} and similarly for the other case. Summing up we have $c^\mathtt{a}(\alpha) = c^\mathtt{b}(\alpha)$ and this concludes the proof. $$\tag*{$\square$}$$

The second point is a bit trickier, because we have observe some properties of words with same number of $\mathtt{a}$'s and $\mathtt{b}$'s that would allow them to be factorized by your grammar. We will proceed with induction on the length of the word (note that the lengths have to be even).

Base:

If $\alpha$ has length $0$, then $\alpha = \varepsilon$ and we can represent it by $S \to \varepsilon$.

Step:

Suppose that $\alpha$ has length $n$ and $S$ can represent any word with equal number of $\mathtt{a}$'s and $\mathtt{b}$'s of length strictly smaller than $n$.

Consider the two following cases:

  • the first and the last letters of $\alpha$ are different,
  • the first and the last letters of $\alpha$ are the same.

If the former is true, then the middle, let's call it $\alpha'$, also has the same number of $\mathtt{a}$'s and $\mathtt{b}$'s and by inductive hypothesis $S$ can generate it. Hence we can represent $\alpha$ as $S \to (\varepsilon)\mathtt{a}(\alpha')\mathtt{b}(\varepsilon)$ or $S \to (\varepsilon)\mathtt{b}(\alpha')\mathtt{a}(\varepsilon)$.

Now, let $\alpha = \mathtt{a}\alpha'\mathtt{a}$ (the case of two $\mathtt{b}$'s is symmetric). Obviously, $\alpha'$ has a surplus of $b$'s, so define $\alpha_1$ as the shortest prefix of $\alpha'$ such that $c^\mathtt{a}(\alpha_1) + 1 = c^\mathtt{b}(\alpha_1)$, and $\alpha_2$ as the rest.

Observe, that the last letter of $\alpha_1$ has to be $\mathtt{b}$, otherwise we could make it shorter. Simple calculation reveals that $\mathtt{a}\alpha_1$ and $\alpha_2\mathtt{a}$ are both words with equal numbers of $\mathtt{a}$'s and $\mathtt{b}$'s and can be generated by $S$ by inductive hypothesis. Therefore, $\alpha$ can be represented as $S\to (\varepsilon)\mathtt{a}(\alpha_1')\mathtt{b}(\alpha_2\mathtt{a})$.

Along with the case of $\alpha = \mathtt{b}\alpha'\mathtt{b}$ we conclude that any $\alpha$ of length $n$ can be generated by $S$ and that ends the proof. $$\tag*{$\square$}$$

These proofs are quite long, but I wanted to put in all the necessary details. I hope it will not scare you. You can note that the proofs point at different grammars, which would also work, namely

$$S \to \mathtt{a}S\mathtt{b}S \mid \mathtt{b}S\mathtt{a}S \mid \varepsilon$$ or $$S \to SS \mid \mathtt{a}S\mathtt{b} \mid \mathtt{b}S\mathtt{a} \mid \varepsilon.$$

Both are similar to grammar of matched parentheses, e.g. $X \to XX \mid \mathtt{(}X\mathtt{)} \mid \varepsilon$ or $X \to \mathtt{(}X\mathtt{)}X \mid \varepsilon$, and this is not a coincidence.

I hope it helps $\ddot\smile$