Collapsing adjacent states in a grammar

33 Views Asked by At

I am trying to write a program which can induce a grammar from an example of the code(really more of a corpus than an example).

I'm ignoring the decision problem, because I am doing two things that I foresee bringing that issue into play: Tokens are separated by spaces and any production rule is terminated at the newline character, and the grammar is context-free(not necessarily unambiguous).

However, the grammars being produced are inefficient because the start state goes to things like $$ S\rightarrow a bc \vee abcd \vee abce $$ The start rule contains all of the terminals, and it does so without breaking it down into anything more efficient.

Is there an algorithm that can attempt to extract the grammar in a better form(with smaller production rules)?

Or does this attempt turn into something utterly within the decidability problem?