Are regular expressions in mathematics related to PCRE or POSIX regexps?

104 Views Asked by At

I've recently come across a number of questions tagged with (regular-expressions), and talking about some type of regular expressions, here's an example of such question.

I know about regular expressions as implemented in perl (so called PCRE) and those defined by POSIX. But in the question linked above the answers look too different from that syntax I know. Are they still some (more general maybe) form of the same concept? How are PCRE/POSIX regexps related to those talked about in that question?

2

There are 2 best solutions below

0
On BEST ANSWER

Regular expressions in formal language theory represent the most basic functionality of commonly-used regexps like PCRE or POSIX regexps. They only have the following operations on symbols from the alphabet (see this link for more details):

  • Concatenation
  • Alternation | (union in the language of sets)
  • Kleene closure * ("repetition" in simple terms)

These operations may be grouped via parentheses. Sometimes the complement operation is also added (which inverts the match) to yield generalized regular expressions.

All the other operations commonly available in computer tools and programming languages are just syntactic sugar, e.g. a+ is the same as aa*, a? is the same as (a|). These are not included in regular expressions in theoretical computer science.

In the theory regular expressions are often represented by sets of characters with corresponding operations on them, and this is why the syntax may not resemble the one programmers are used to. E.g. instead of | one can see $\cup$, and instead of empty string one may see $\varepsilon$.

1
On

The Perl "regular expressions" are extended pattern matching doodahs. The original meaning of "regular expression" was something different and better defined. See https://en.wikipedia.org/wiki/Regular_language and https://en.wikipedia.org/wiki/Regular_expression#Formal_language_theory. Russ Cox's discussion is also quite a good read: http://swtch.com/~rsc/regexp/regexp1.html