I have a non-ambiguous context-free grammar. Is there some standard algorithm to create list of all the words in the language the CFG defines?
This can be done with an abvious brute-force search by listing all strings in lexicographic order and testing which belong to the language, but I would prefer some more tractable method.
Yes, this can be done using a recursive algorithm called
expandas follows.expandwill take in a node in the grammar and return a set of trees that can be generated from that node.In the base case,
expandis given a terminal and returns a singleton set containing a leaf (one node tree) for that terminal.In the recursive case,
expandis given a node, say $X$, that can be rewritten as $Y_1 Y_2 \ldots Y_n$ via CFG rule $X \to Y_1 Y_2 \ldots Y_n$.expandcombines the trees, $z_i$, returned from the recursive calls on each of the $Y_i$ using a Cartesian product. That is,expand(X)will return the following set of trees: $\left\{ X \to z_1 z_2 \ldots z_n \mid z_i \in \text{expand}(Y_i) \right\}$. Here, each tree in the set is rooted at $X$ and contains the trees from the recursive calls as its children.EDIT
For efficiency, you can use a different collection instead of sets (e.g. array / list). As long as there is no spurious ambiguity, you will compute the same answer. If the grammar turns out to be infinite, replace the sets in the algorithm with lazy streams, and it will continue to work.