How we derive language from grammar by bottom up or any other approach?

449 Views Asked by At

Consider a CFG with the following productions.

S → AA | B
A → 0A | A0 | 1
B → 0B00 | 1

$S$ is the start symbol, $A$ and $B$ are non-terminals and $0$ and $1$ are the terminals. The language generated by this grammar is

  1. $\{0^n 10^{2n} | n ≥ 1\}$
  2. $\{0^i 10^j 10^k | i, j, k ≥ 0\} ∪ \{0^n 10^{2n} | n ≥ 1\}$

  3. $\{0^i 10^j | i, j ≥ 0\} ∪ \{0^n 10^{2n} | n ≥ 1\}$

  4. The set of all strings over $\{0, 1\}$ containing at least two $0$'s

My attempt :

option $(4)$ is regular language , $L = (0+1)^*.0.(0+1)^*.0.(0+1)^*$ ,

and given grammar is context free

which should be $L = \{0^i 10^j 10^k | i, j, k ≥ 0\} ∪ \{0^n 10^{2n} | n ≥ 0\}$ , $n>= 0$ instead of $n>=1$ .

Can you explain in formal way please, how we derive language from grammar by bottom up or any other approach?

1

There are 1 best solutions below

4
On BEST ANSWER

There are mechanical techniques for converting a regular grammar to a regular expression for the language; for context-free grammars, however, I know of no algorithmic approach. In this case, however, the nature of the grammar makes a rigorous analysis fairly straightforward.

Notice that if you use the production $S\to AA$ in a derivation, you can never get $B$, and if you use $S\to B$, you can never get $A$. Thus, either you start a derivation with $S\Rightarrow AA$, in which case you get exactly the words that can be generated from $AA$ using the productions $A\to 0A$, $A\to A0$, and $A\to 1$, or you start it with $S\Rightarrow B$, in which case you get the language of the grammar whose initial symbol is $B$ and whose productions are $B\to 0B00$ and $B\to 1$.

These are independent languages, and ours is simply the union of them, so we should try to figure out what these simpler languages are. The second one is easiest. Every derivation of a word in it has the form

$$S\Rightarrow B\Rightarrow^n 0^nB0^{2n}\Rightarrow 0^n10^{2n}$$

for some $n\ge 0$, so it’s simply $\big\{0^n10^{2n}:n\ge 0\big\}$.

Now let’s see what can be derived from $A$. Suppose that I start with a single $A$ and apply the production $A\to 0A$ $m$ times and the production $A\to A0$ $n$ times. No matter the order in which I perform these $m+n$ derivation steps, I’ll end up with $m$ zeroes to the left of the $A$ and $n$ zeroes to the right of the $A$, or $0^mA0^n$. Eventually I have to use $A\to 1$, and at that point I’ll have $0^m10^n$. In other words, the language that I can generate using $A$ as my initial symbol is

$$\big\{0^m10^n:m,n\ge 0\big\}\;,$$

the set of words containing exactly one $1$. Call this language $L_A$. It’s not hard to see that if I start with $AA$, the derivations from the two $A$s proceed independently of each other, and I end up with a word in $L_AL_A$. Conversely, I can get every word of $L_AL_A$ in this way. Thus, $L_AL_A$ is the set of words that I can derive from $AA$. Finally, it’s also not hard to see that $L_AL_A$ is simply the set of words containing exactly two $1$s, which is

$$\big\{0^\ell10^m10^n:\ell,m,n\ge 0\big\}\;.$$

Thus, the language of the original grammar is

$$\big\{0^n10^{2n}:n\ge 0\big\}\cup\big\{0^\ell10^m10^n:\ell,m,n\ge 0\big\}\;.$$

Thus, choice $(2)$ is almost correct, but as you say, it should allow $n=0$. Choice $(1)$ is (apart from the same error) the language that would be generated if we removed all of the productions involving $A$. Choice $(3)$ is (apart from the same error) the language that would be generated if we replaced $S\to AA$ by $S\to A$. And choice $(4)$ clearly has nothing to do with the language of this grammar.

Note, though, that the fact that $(4)$ is regular does not mean that it must be wrong: it’s entirely possible for a context-free grammar that is not regular to general a regular language.