Proving a Formal Grammar Parses a String.

51 Views Asked by At

In summary, I am interested to know (at a high level) what a proof would look like, and what I would need to do in developing a grammar, to "prove" that the grammar matches some patterns I have in mind.


My initial method was to write out specific examples of strings I would like for the grammar to parse:

\begin{align} &a\\ &aaa\\ &aab\\ &aba\\ &aaa.a\\ &aaabb\\ &aba..\\ &abbaa\\ &abbba \end{align}

Then I build the rules for the grammar as best as I can to parse those examples.

\begin{align} S &\to aN_1\\ N_1 &\to aN_1 \mid bN_2 \mid a \mid b\\ N_2 &\to aa \mid bb \mid N_1 \end{align}

Imagine I spent a lot of effort creating this grammar and hope that it parses all the strings I have in mind. (In reality this grammar is probably wrong because I just made it up real quick). The best I can do is check each example by running through the grammar and parsing it step by step in my head. Let's say that from my best attempts at doing this mental calculation, I think all the examples can be parsed by it. But then it turns out that a parser that implements this quickly finds "oh wait, $aaaba$ and $ababb$ aren't actually parsed". Note that while $aaaba$ is not parsed, maybe it turns out that $aaaaa$ was parsed, from the pattern $aaa.a$. Same with how $ababb$ isn't parsed, but $ababa$ is as well as $abaaa$, from the pattern $aba . .$ above. So we have:

  • $a$
  • $aaa$
  • $aab$
  • $aba$
  • aaaba
  • $aaaaa$
  • $aaabb$
  • ababb
  • $abbaa$
  • $abbba$

Basically what my process has been so far is to do proof by example, which is wrong but I don't know of an alternative. In addition, this "made up" grammar is fairly simple, a real world grammar that I care about might have $*$ and $+$ operators for matching $0$ or more and $1$ or more occurences, respectively. That means at some point there would be too many possible strings to list out all of the examples. So what I have been doing is to try to list out the "main" examples that will "most likely" be in the string, and use that to define and build the grammar.

My question is, how to do better than this. How to write the grammar and a proof in such a way that I can say "it is proven to be correct". Wondering if proof by example is actually "correct" in this case, because theoretically if it works for some examples that fit certain patterns, it works for other strings that are similar. That's where I start getting lost. I don't see how to make a "rock solid" grammar that will be proven to do what I want (parse certain patterns of strings).


A grammar is a tuple $(N,\Sigma ,P,S)$, where:

  • $N$ is a finite set of nonterminal symbols.
  • $\Sigma$ is a finite set of terminal symbols.
  • $P$ is a finite set of production rules.
  • $S \in N$ is the start symbol.
1

There are 1 best solutions below

1
On

The standard reference for your problem is

D. Angluin, Learning regular sets from queries and counterexamples, Inform. and Comput. 75 (1987), no. 2, 87--106.