Question:
The cardinality of an n-fold Cartesian product upon a language $L$ is simple: $|L|^{n}$. Is there a simple solution to the cardinality of language concatenation?
For example, is there a single formula that can calculate the cardinality of each of these cases for a given $n$?
- $\{\varepsilon, a\}^n$
- $\{a, b\}^n$
- $\{\varepsilon, a, b\}^3\{\varepsilon, d, e, f, g\}^8$
I have spent countless hours searching and cannot find anything telling me yes or no.
Attempts / Research:
By working through examples, I have noticed a few patterns. Let "nfc" = "n-fold cardinality", "cc" = "concatenation cardinality", and "cdr" = "concatenation duplicates removed."
- $\{a, b\}\{a, b\}$ = $\{aa, ab, ba, bb\}$ (nfc 4, cc 4, cdr 0)
- $\{a, aa\}\{a, aa\} = \{aa, aaa, aaaa\}$ (nfc 4, cc 3, cdr 1)
- $\{\varepsilon, a, aa\}\{\varepsilon, a, aa\} = \{\varepsilon, a, aa, aaa, aaaa\}$ (nfc 9, cc 5, cdr 4)
- $\{\varepsilon, a, b\}\{\varepsilon, a, b, c\} = \{\varepsilon, a, b, c, aa, ab, ac, ba, bb, bc\}$ (nfc 12, cc 10, cdr 2)
- $\{\varepsilon, a, b\}\{c, d, f\} = \{c, d, f, ac, ad, af, bc, bd, bf\}$ (nfc 9, cc 9, cdr 0)
Some observations:
- cc + cdr = nfc
- When the two languages share no common elements, cc = nfc
- When concatenating a language with itself, whether or not the elements are composed of each other makes a big difference in the cc / cdr values.
- Languages containing completely distinct elements have cc = nfc.
- Languages with elements that are concatenations of each other have cc < nfc.
In general the formula seems to follow the format $|L_1| * |L_2| - x$ where $x$ is unknown to me.
Is there a general solution?
I am not sure there is an easy answer for the concatenation product of two finite languages $F$ and $G$, although one could give sufficient conditions to have $|FG| = |F||G|$.
However, the cardinality of the successive powers of a finite language can be computed, thanks to this result:
The algorithm is better explained on an example. Let $F = \{a, ab, ba\}$. Observe that since $a(ba = (ab)a$, $|F| = 3$ but $|F^2| = 8$. Here is a way to compute $|F^n|$ for all $n$.
The language $F^*$ is regular and its minimal automaton is the following:
$\hskip 100pt$
By using McNaughton and Yamada's algorithm, one can obtain from this automaton an unambiguous regular expression, namely $$ F^* = (a^*ab + ba)^*a^* $$ The definition of an unambiguous regular expression is given below.
Setting $x_1 = a$, $x_2 = ab$ and $x_3 = ba$, one gets $$ F^* = (x_1^*x_2^{\ } + x_3^{\ })^*x_1^* $$ Thus $F^*$ can be written as a rational series in noncommutative variables $x_1, x_2, x_3$. Furthermore, since the expression is unambiguous, the coefficients of this series are $0$ or $1$. Replacing $x_1$, $x_2$ and $x_3$ by a single variable $t$ yields a rational series in one variable, namely $(t^*t + t)^*t^*$, which is the generating series of the sequence $(|F^n|)_{n \geqslant 0}$, since the coefficient of $t^n$ counts the number of words of $\{x_1, x_2, x_3\}^n$ which are in $F^*$ $$ (t^*t + t)^*t^* = \sum_{n \geqslant 0} |F^n|t^n $$ Since $t^* = 1/(1-t)$, one can compute the Taylor development of this series using $\tt{sagemaths}$:
$> \tt{var('t');} \quad \tt{taylor ((1 / ((1 - t) * (1 - (t * (1 + (1/(1-t))))))), t, 0, 9);}$
$6765 t^9 + 2584 t^8 + 987 t^7 + 377 t^6 + 144 t^5 + 55 t^4 + 21 t^3 + 8 t^2 + 3 t + 1$
Thus $|F| = 3$, $|F^2| = 8$, $|F^3| = 21$, $|F^4| = 55$, etc. One can verify that $|F^{n+1}| = 3|F^n| - |F^{n-1}|$ and that $|F^n| = F_{2n+1}$, where $F_n$ is the $n$-th Fibonacci number.