Generating all words in a language from CFG

2k Views Asked by At

I have a non-ambiguous context-free grammar. Is there some standard algorithm to create list of all the words in the language the CFG defines?

This can be done with an abvious brute-force search by listing all strings in lexicographic order and testing which belong to the language, but I would prefer some more tractable method.

1

There are 1 best solutions below

1
On BEST ANSWER

Yes, this can be done using a recursive algorithm called expand as follows.

expand will take in a node in the grammar and return a set of trees that can be generated from that node.

In the base case, expand is given a terminal and returns a singleton set containing a leaf (one node tree) for that terminal.

In the recursive case, expand is given a node, say $X$, that can be rewritten as $Y_1 Y_2 \ldots Y_n$ via CFG rule $X \to Y_1 Y_2 \ldots Y_n$. expand combines the trees, $z_i$, returned from the recursive calls on each of the $Y_i$ using a Cartesian product. That is, expand(X) will return the following set of trees: $\left\{ X \to z_1 z_2 \ldots z_n \mid z_i \in \text{expand}(Y_i) \right\}$. Here, each tree in the set is rooted at $X$ and contains the trees from the recursive calls as its children.

EDIT

For efficiency, you can use a different collection instead of sets (e.g. array / list). As long as there is no spurious ambiguity, you will compute the same answer. If the grammar turns out to be infinite, replace the sets in the algorithm with lazy streams, and it will continue to work.