equivalence of finite state and regular expression

1.2k Views Asked by At

Consider the following theorem:
Theorem: If L=L(A) for some DFA A, then there is a regular expression R such that L=L(R)

Proof: Let A's states be: {1,2,3,....n} for some integer n. Let $ R^k_{ij}$ be the name of regular expression whose language is the set of path from state i to state j in A, and it's intermediary nodes do not have number greater than k.

Basis: For k = 0, there are no intermediary nodes. So, when i != j, if the symbol for transition is a, then:
a) if there is no such symbol a, then: $ R^0_{ij} = \{\} $
b) if there is one then: $ R^k_{ij} = a$
c) if there are symbols $\ a_1, a_2, ... a_k$ then $ R^0_{ij} = a_1+a_2...+a_k$

Induction: Suppose there is a path from state i to j that goes through no higher state than k. There are two possible cases.

1) The path does not go through state k at all. In this case, the label of the path is in the language of: $ R^{(k-1)}_{ij}$
#so far so good- no confusion upto this point

The biggest part of confusion is here in this statement no. 2
2) The path goes through state k at least once. It may be from i to k, k to k multiple times, and from k to j OR it may be from i to k and k to j. So, the regular expression would be: $ R^{k-1}_{ik}.(R^{k-1}_{kk})^*.R^{k-1}_{kj}$ So, combining the regular expressions from 1 and 2 we get the regular expression from the automaton as: $$ R^k_{ij} = R^{k-1}_{ij} + R^{k-1}_{ik}.(R^{k-1}_{kk})^*.R^{k-1}_{kj}$$

The notation of $R^{k}_{ij}$ is that it should have no intermediary nodes greater than k, put another way, it may have less than *or equal to * k numbered states. So, in induction point number 2, we have the case where the intermediary nodes are allowed to touch k, though not pass higher than k. As it states, the path goes through k at least once, and yet, The regular expression is given as: $ R^{k-1}_{ik}.(R^{k-1}_{kk})^*.R^{k-1}_{kj}$, in my opinion it should rather be: $ R^{k}_{ik}.(R^{k}_{kk})^*.R^{k}_{kj}$. Where am I understanding this wrong, because, the former expression given in the proof, does not even touch k, where as my given expression would touch k.

Again, I could think it could be some print error given in the book, but all exercises have been based on this formula, so it should be accurate.

1

There are 1 best solutions below

13
On BEST ANSWER

Your suggested regular expression is correct, in the sense that it is equal to $R_{ij}^k$, but it's not very useful, because to construct $R_{kk}^k$ for example, you already need to know what $R_{kk}^k$ is!

The point of the proof is to induct on $k$, and to do this we cleverly define $R_{ij}^k$ in terms of expressions $R_{mn}^l$ where $l<k$, and we assume we already know how to construct such expressions. Otherwise we would have a circular definition.