I'm attempting to build a regular expression that will accept only strings of the form:
Begins with: /#
Ends with: #/
Contains the following in between /# and #/:
Any combination of {a, b, /, #} but not the combination #/
Bascially, a regular expression that determines whether a string is appropriately comment delimited. I've tried many expressions, but can't find anything that quite works. I'm not sure how to allow all other combinations of a,b,/,# but disallow #/. Any help putting me on the right track would be much appreciated.
Let me write $c$ for $/$ and $d$ for $\#$ to increase readability. Let $A = \{a, b, c, d\}$. The language you want can be expressed by the extended regular expression $cd(A^* - A^*dcA^*)dc$.
This language is accepted by the following DFA (initial state $1$, unique final state $5$) \begin{array} &&&\mid &1 &2 &3 &4 &5 \\ \hline &a,b &\mid &- &- &3 &3 &- \\ &\ c &\mid &2 &- &3 &5 &- \\ &\ d &\mid &- &3 &4 &4 &- \end{array} The transitions are $\ \xrightarrow{} 1 \xrightarrow{c} 2 \xrightarrow{d} 3 \xrightarrow{d} 4 \xrightarrow{c} 5 \xrightarrow{}$, $\quad 4 \xrightarrow{a,b} 3$, $\quad 3 \xrightarrow{a,b,c} 3$, $\quad 4 \xrightarrow{d} 4$. One can now convert this automaton to the regular expression $$cd\bigl(a+b+c+dd^*(a+b)\bigr)^*d^*dc$$ Coming back to the original notation, we get finally $$/\#\bigl(a+b+/+\#\#^*(a+b)\bigr)^*\#^*\#/$$