Let B be the language over alphabet {a, ... z} consisting of those words occuring in the Bible. Thus, B = {in,the,beginning,god,created,...}.
Describe an NFA whose language is B. Describe what happens when you apply subset construction to this NFA to get a DFA, ignoring unreachable states. Roughly, how do the number of states in each compare? How does each number compare to the # of distinct words in the Bible?
Of course, this is homework so I'm not asking for answers so much as a bit of a shove in the right direction. I'm pretty thoroughly stumped here.
I'm picturing the NFA as a series of letters that go down an ordered path in the same order that the letters show up in the Bible. Which not only seems really impractical but a little too stupid of an answer for it to be correct. What would an NFA of this even look like? Would it really be huge?
Thanks for any help.
According to this page, there are 788,258 words in the Bible but only 14,565 distinct words. Now, it is shown in [1] that a full Scrabble lexicon with 94,240 entries can be represented be a DFA with 19,853 states. So you can expect the DFA of the Bible to have roughly $19,853 \times \frac{14,565}{94,240} \approx 3068$ states.
[1] A. W. Appel et G. J. Jacobson, The world fastest scrabble program, Commun. ACM 31 (1988), 572–585.