Understanding Automata Theorem

141 Views Asked by At

This theorem is theorem 3.13 in Hopcroft, Motwani, and Ullman's "Introduction to Automata Theory, Languages, and Computation", 3rd international edition on page 111.

Theorem 3.13:

Let E be a regular expression with variables $L_1, L_2, \dots, L_m$. Form concrete regular expression C by replacing each occurrence of $L_i$ by the symbol $a_i$, for $i = 1,2,\dots,m$. Then for any languages $L_1, L_2, \dots, L_m$, every string $w$ in $L(E)$ can be written $w = w_1w_2...w_k$, where each $w_i$ is in one of the languages, say $L_{ji}$, and the string $a_{j1}a_{j2} \dots a_{jk}$ is in the language $L(C)$.

Less formally, we can construct $L(E)$ by starting with each string in $L(C)$, say $a_{j1}, a_{j2}, \dots a_{jk}$, and substituting for each of the $a_{ji}$'s any string from the corresponding language $L_{ji}$.

PROOF: The proof is a structural induction on the expression E.

BASIS: The basis cases are where E is epsilon, empty set, or a variable L. In the first two cases, there is nothing to prove, since the concrete expression $C$ is the same as $E$.

If $E$ is a variable $L$, then $L(E) = L$. The concrete expression $C$ is just $\mathbf{a}$ where $\mathbf{a}$ is the symbol corresponding to L. Thus, $L(C) = \{a\}$. If we substitute any string in $L$ for the symbol $a$ in this one string, we get the language $L$, which is also $L(E)$.

Questions: What if $L$ contains three elements, such as strings $b$, $bb$, and $bbb$? If we substitute any string in $L$ for the symbol $a$ in this one string, do we not get a subset of $L$? If $C = a$ and substitute the string b for $a$, then $C$ is a subset of $L$. (but not equal to L).

What is wrong with my reasoning?

1

There are 1 best solutions below

1
On

The key passage is, "substituting for each of the $a_{j_i}$'s any string from the corresponding language $L_{j_i}$." You have to consider all strings in $L(C)$ and all possible substitutions. If you only consider one string of $L_{j_i}$, as your example shows, you are not guaranteed the full $L(E)$.

While that "any" is admittedly a bit ambiguous, taken in the context of the formal statement it's meant to clarify, it's reasonably clear. No matter how you pick a string from each $L_{j_i}$, you get a string of $L(E)$. The same string of $L(E)$ may be obtained in more than one way.