What is the difference between regex operations in math and regex in UNIX / Linux?

917 Views Asked by At

What is the difference between regular expression operations (union, concatenation, kleene star) and regular expression (implemented in UNIX and can be used together with the grep command)?

Are there such methods that are implemented in UNIX and can be used together with the grep command, that cannot be recreated using only the regular expression operations (union, concatenation, kleene star)

regular expression (implemented in UNIX and can be used together with the grep command) :

(.,*,^,$,\,\<,|>,[],+,?,&,\n))
3

There are 3 best solutions below

0
On

Regular Expressions is a formal language defined by induction with a finite set of rules. You have some info over here: http://mathworld.wolfram.com/RegularExpression.html

5
On

Regular Expressions in formal languages and automaton theory are the basis for regular expressions in UNIX. Some of the syntax is a bit different in grep than in the formal theory. For example, in grep, you can use $(a+b)^{2}$, while this is not acceptable syntax theoretically (though it would make sense to have it theoretically). Grep also has a larger set of symbols it allows in regular expression syntax.

Grep (at least some versions of it) use finite state automaton to handle the search.

I am not a *Nix expert, nor am I a formal languages expert. Someone may have more enlightening answers than I did. Hopefully this helps some, though.

1
On

The mathematical (theoretical CS) definition of regular is that which is recognized by a NFA or DFA. Equivalently it can be generated by a regular grammar.

Regexes are things that have popular implementations in Perl, Python, grep, etc. I won't call them regular expressions because for most (all?) of the modern implementations, they are not regular anymore.

The reason that regexes aren't regular anymore is the back reference feature. This makes it at least as powerful as a context-free grammar. Unlimited backreferences make it even more powerful than CFGs.

As an example written in Python:

import re
myre = re.compile( "(.*)-\\1" )
m = myre.match( "abcdefg-abcdefg" )
if m:
    print m.group(1)

The mathematical definition of regular simply does not allow this kind of power.