How does one come up with regular expressions?

603 Views Asked by At

A regular expression is a sequence of characters and operators (concatenation, union, and Kleene $\bigstar$) that defines a pattern.

Kleene $\bigstar$ operator: $S^\star=\{\epsilon\} \cup S \cup SS \cup SSS \cup ...$

For example the set of all binary integers can be defined as $\{0,1\}\{0,1\}^\star$, and many other regular expressions.

The set of 'ab' strings with an even number of $a$'s can be defined as $(\{b\}^\star \{a\}\{b\}^\star \{a\}\{b\}^\star)^\star$, and many other regular expressions.


I always have a lot of trouble coming up with a regular expression that englobes all the strings (sequence of characters) matching a description.

How does one think of regular expressions (preferably unambiguous) which correctly define a given pattern? What is the thought process?

1

There are 1 best solutions below

2
On

I'll go through "strings from $\{a,b\}$ with an even number of $a$'s".

An even number is just two plus another even number, or zero. This is a recursive formulation which is amenable to being formalised in some system - whether or not that system is regexes. ("Divisible by $2$" is not a formalisation that lends itself to regexes.)

What is a regex which matches strings with exactly two $a$'s, then? It could have $b$'s interspersed anywhere among them: so $(b^* a b^* a b^*)$.

Then a string with an even number of $a$'s is one with zero $a$'s, or with two $a$'s, or with two $a$'s and then another two $a$'s, or…

That is, $$\epsilon \ \vert \ (b^* a b^* a b^*) \ \vert \ (b^* a b^* a b^*)(b^* a b^* a b^*) \ \vert \ \dots$$

A quick moment of pattern-recognition matches this to $$(b^* a b^* a b^*)^*$$


Start with the simplest examples, and try and build them up into bigger ones.