Grammar extraction using pre-existing grammars.

47 Views Asked by At

Given a set of strings $s_1, \dots, s_n$ over $\Sigma$ it isn't clear what a good generalization of the strings would be using a regular language, that extends the set infinitely. This comes from the many interpretations you could have of the set of strings.

What about taking pre-existing grammars like the following two

DIGITS = [0-9]+
ID = (ALPHA | _ ) (ALPHA | _ | DIGITS)*

used in programming language parsing, call them $g_1, g_2$.

Consider permuting the terminals of a grammar, for instance $\phi : a \mapsto b, \ b \mapsto c$ and $g$ is: $$ g \to b AB\\ A \to aaA + B \\ B \to abab $$

Then $\phi \circ g$ is $$ g' \to c A B \\ A \to bbA + B \\ B \to bcbc $$

Then these alphabet permutations (aka terminal permutations) preserve the structure of the grammars. So now ask whether $s_i \in \phi\circ g_j$ (language membership), for some $j,\phi$, or even if $s_i \in (\phi_1 g_1) \cdot \dots \cdot (\phi_m g_m)$, for some given set of $g_i$ and for some permutations $\phi_i$.

Or what if we took $\Sigma' = \Sigma \cup \{g_1, \dots, g_m\}$ and considered permutations of $\Sigma'$.

Any grammar containing the $\{s_1, \dots, s_n\}$ we construct using these methods has structure derived from the given grammrs $g_i$, and this essentially answers the question as to what a good generalization of a set of strings would be: it's the one that is built from specified, preferrable grammar structures. The specified grammars act like hints to the algorithm as to what grammars to choose. Has anyone investigated this?

1

There are 1 best solutions below

1
On BEST ANSWER

Given any finite set $S$ of strings, you can define an extension just by saying that your language is the set of all strings that contain a member of $S$ as a prefix and/or suffix. Or since $S$ is finite, you can take the union of $S$ with any regular language and get a regular language. So it seems like your question is a bit ill-posed. Maybe the "minimal" such language that contains your strings in $S$ is the set you get from applying the pumping lemma criteria to $S$, trying to get as many strings as possible from $S$ into the same pumping class?