Do lookahead and lookbehind in regex come from some type of formal languages?

120 Views Asked by At

Real-world regexs are extensions of regular expressions in formal languages.

Do lookahead and lookbehind come from some type of formal languages? (I suspected lookahead in LR(k) grammars, simply because of the same name, but I have difficulty to either confirm or unconfirm it. Not sure about lookbehind.)

Btw: I have been interested in which extensions in regexes come from which types of formal languages. Do you know somethings for me to read?

Thanks.

2

There are 2 best solutions below

4
On

No, they are pragmatic hacks based on features found to be useful by programmers and implementable by library maintainers.

Most aspects of programming libraries follow this paradigm. The theory is important, and it probably could be used better. But it's s very rare for someone to decide to omit a useful feature just because it is not part of a theoretical framework.

Just to be clear, lookahead assertions in regex libraries have nothing to do with parser lookaheads, except that both represent answers to the question "What comes next?" (but not in the same problem domain).

7
On

The theoretical framework for regular expression matching is about recognising complete strings in a language defined by a regular expression. What you generally need in practice is either (1) the ability to split up a long string into substrings that match one of several regular expressions or (2) the ability to search in a string for a substring that matches one regular expression.

In both cases (1) and (2), the actual problem is a bit different from the more general problem of recognising strings in the language defined by a regular expression. And in both cases there are some technical details that need to be tied down to make the problem well-specified. E.g., in case (2), there can be multiple matches: which one do we choose? It is also useful to give the user extra control over how the matching is done: e.g., by supplying some contextual constraints for the substring to be matched as in the "lookahead" and "lookbehind" constructs that you are asking about.

The theoretical framework is very important for our understanding of the practical problems and their solutions. I think you will find that the usual regex libraries follow the theory quite closely for their core matching algorithms with a few pragmatic additions to handle thinks like contextual constraints and counting constraints.