Finding a general formula for a regular expression

809 Views Asked by At

I need to find the general formula for the regular expression $(S+T)^n$ where $S$ and $T$ are arbitrary regular expressions over a one-letter alphabet and $n$ is an arbitrary natural

The general formula for $(X+Y)^n$ where $X$ and $Y$ are arbitrary numbers is easy in that it just follows the formula $x^n + \binom{n}{1}x^{n-1}y + ... + \binom{n}{n-1}xy^{n-1} + y^n$.

However, with Regular Expressions order matters and that's where I'm stuck on how to write a understandable general formula.

1

There are 1 best solutions below

3
On

When the alphabet contains only one symbol, then order does not matter. Concatenation of strings and of languages is commutative. If $S,T$ are languages in a one-symbol alphabet then $ST = TS$, and because + is union there's no need for coefficients. In this case, the general formula has $n+1$ terms: $$ (S+T)^n = \sum_{i=0}^n S^iT^{n-i} $$ where the sum is repeated "+" (union, alternation).

When the alphabet contains more than one symbol, order matters. Concatenation of strings and of languages is not commutative, so the terms with the same number of $S$s and same number of $T$s can't be collected, as they are in general distinct things. Instead of $n+1$ terms, $(S+T)^n$ expands to the full $2^n$ terms, and $(S_0 + \cdots + S_{k-1})^n$ expands to $k^n$ terms.

For $k = 2$, $$ (S_0 + S_1)^n = \sum_{p\colon n\to 2} \prod_{i<n} S_{p(i)} $$ where

  • the sum is taken over all functions from $n=\{0,\dotsc, n-1\}$ to $2=\{0,1\}$, and
  • each product is concatenation, from left to right in order of increasing $i<n$.

With these conventions, the general formula for any $k$ can be stated: $$ (S_0 + \cdots + S_{k-1})^n = \sum_{p\colon n\to k} \prod_{i<n} S_{p(i)}. $$

A function $n\to k$ is essentially an $n$-tuple of $0,\dotsc, k-1$, and these can be ordered lexicographically.