Regular expression for a particular language

64 Views Asked by At

Several years ago I came across a paper that defined a regular expression (or collection of regular expressions?) for a specific language.

The language is the language of set partitions enumerated by the Stirling Numbers of the Second Kind. The words in this language are composed of symbols $\{a,b,c...\}$ such that $b$ does not occur before the first $a$, $c$ does not occur before the first $b$, etc.

I believe the regular expression was something along the lines of $a^*(a|b)^*b(a|b|c)^*c$

the list of words of length 4 are below:

aaaa
aaab
aaba
aabb
aabc
abaa
abab
abac
abba
abbb
abbc
abca
abcb
abcc
abcd

I've been trying to hunt down the paper for some time. I would appreciate a reference.

2

There are 2 best solutions below

0
On

I don't know the paper, but the regular expression you stated seems to be wrong. First, it allows a word to begin with $b$, since it starts with $a^\star$, not $aa^\star$. Second, it requires at least one $b$ and one $c$.

A correct regular expression for a $4$-symbol alphabet $\{a,b,c,d\}$ would be $$ a[ab]^\star(b[abc]^\star(d[abcd]^\star)?)? $$ you can avoid the nesting by writing this as $$ a[a]^\star\big | a[a]^\star b[ab]^\star \big| a[a]^\star b[ab]^\star c[abc]^\star \big| a[a]^\star b[ab]^\star c[abc]^\star d [abcd]^\star $$


Note that $[\alpha_1\ldots \alpha_n]$ means $(\alpha_1|\alpha_2|\ldots|\alpha_n)$ where $\alpha_1,\ldots,\alpha_n$ are symbols.

0
On

Think of it this way. Unless you're the empty string You must begin with $a$. The regex is $$ a $$

Then you cannot have another symbol until you have a $b$. $$ a a^*$$

And you can have a $b$ at some point. (Note: $b^?$ is a shorthand for $(b|)$.)

$$ a a^* b^?$$

The $b$ can be followed by an arbitratry quantity of $a$ or of $b$

$$ a a^* (b (a|b)^*)^?$$

And that can be followed by a $c$ and possibly some $a$, $b$ or $c$.

$$ a a^* ( b (a|b)^* ( c (a|b|c)^* )^?)^?$$

And so on... $$ a a^* ( b (a|b)^* ( c (a|b|c)^* ( d (a|b|c|d)^* )^? )^?)^?$$ $$ a a^* ( b (a|b)^* ( c (a|b|c)^* ( d (a|b|c|d)^* ( e (a|b|c|d|e)^* )^?)^? )^?)^?$$ $$\dots$$

If you want to allow the empty string, you can enclose everything in $(\dots)^?$ of course.

$$( a a^* ( b (a|b)^* ( c (a|b|c)^* ( d (a|b|c|d)^* ( e (a|b|c|d|e)^* )^?)^? )^?)^?)^?$$