Question: In certain programming languages, comments appear between delimiters such as /# and #/. Let C be the language of all valid delimited comment strings. A member of C must begin with /# and end with #/ but have no intervening #/. For simplicity, assume that the alphabet for C is ∑ = {a, b, /, #}.
Give a regular expression that generates C.
My solution: $/\# (a\cup b \cup / \cup \#)^* \#/$
My reasoning is that it has to begin and end with /# and #/, so what is in between is irrelevant.
Is my solution valid? Thanks.
Corrected version. Your regular expression allows $/\#a\color{red}{\#/}\#/$, which is explicitly disallowed by the definition: there can be no instance of $/\#$ between the delimiters, so the red $\#/$ is not allowed. We need to ensure that no $\#/$ occurs between the delimiters. To put it a little differently, we need to ensure that if a $\#$ occurs between the delimiters, it is immediately followed by an $a$, a $b$, another interior $\#$, or the righthand delimiter.
The regular expression
$$/\#(a\cup b\cup /\cup \#a\cup \#b)^*\#/$$
almost works, but it’s too restrictive: it doesn’t allow strings of the form $/\#\#^+\#/$, for instance, or strings of the form $/\#\#^+a\#/$, all of which should be acceptable. Two small modifications fix much of the problem:
$$/\#(a\cup b\cup /\cup \#^+a\cup \#^+b)^*\#/$$
almost works, but it still misses $/\#\#\#/$ and some other legitimate strings. Can you find one more simple fix to patch this hole?