What is the cardinality of the result of concatenating two languages?

1.1k Views Asked by At

Question:

The cardinality of an n-fold Cartesian product upon a language $L$ is simple: $|L|^{n}$. Is there a simple solution to the cardinality of language concatenation?

For example, is there a single formula that can calculate the cardinality of each of these cases for a given $n$?

  • $\{\varepsilon, a\}^n$
  • $\{a, b\}^n$
  • $\{\varepsilon, a, b\}^3\{\varepsilon, d, e, f, g\}^8$

I have spent countless hours searching and cannot find anything telling me yes or no.


Attempts / Research:

By working through examples, I have noticed a few patterns. Let "nfc" = "n-fold cardinality", "cc" = "concatenation cardinality", and "cdr" = "concatenation duplicates removed."

  • $\{a, b\}\{a, b\}$ = $\{aa, ab, ba, bb\}$ (nfc 4, cc 4, cdr 0)
  • $\{a, aa\}\{a, aa\} = \{aa, aaa, aaaa\}$ (nfc 4, cc 3, cdr 1)
  • $\{\varepsilon, a, aa\}\{\varepsilon, a, aa\} = \{\varepsilon, a, aa, aaa, aaaa\}$ (nfc 9, cc 5, cdr 4)
  • $\{\varepsilon, a, b\}\{\varepsilon, a, b, c\} = \{\varepsilon, a, b, c, aa, ab, ac, ba, bb, bc\}$ (nfc 12, cc 10, cdr 2)
  • $\{\varepsilon, a, b\}\{c, d, f\} = \{c, d, f, ac, ad, af, bc, bd, bf\}$ (nfc 9, cc 9, cdr 0)

Some observations:

  • cc + cdr = nfc
  • When the two languages share no common elements, cc = nfc
  • When concatenating a language with itself, whether or not the elements are composed of each other makes a big difference in the cc / cdr values.
    • Languages containing completely distinct elements have cc = nfc.
    • Languages with elements that are concatenations of each other have cc < nfc.

In general the formula seems to follow the format $|L_1| * |L_2| - x$ where $x$ is unknown to me.

Is there a general solution?

1

There are 1 best solutions below

0
On BEST ANSWER

I am not sure there is an easy answer for the concatenation product of two finite languages $F$ and $G$, although one could give sufficient conditions to have $|FG| = |F||G|$.

However, the cardinality of the successive powers of a finite language can be computed, thanks to this result:

Theorem. For each finite language $F$, the generating series $\sum_{n \geqslant 0} |F^n| t^n$ is rational. In particular, the sequence $(|F^n|)_{n \geqslant 0}$ satisfies a linear recurrence.

The algorithm is better explained on an example. Let $F = \{a, ab, ba\}$. Observe that since $a(ba = (ab)a$, $|F| = 3$ but $|F^2| = 8$. Here is a way to compute $|F^n|$ for all $n$.

The language $F^*$ is regular and its minimal automaton is the following:

$\hskip 100pt$

By using McNaughton and Yamada's algorithm, one can obtain from this automaton an unambiguous regular expression, namely $$ F^* = (a^*ab + ba)^*a^* $$ The definition of an unambiguous regular expression is given below.

Setting $x_1 = a$, $x_2 = ab$ and $x_3 = ba$, one gets $$ F^* = (x_1^*x_2^{\ } + x_3^{\ })^*x_1^* $$ Thus $F^*$ can be written as a rational series in noncommutative variables $x_1, x_2, x_3$. Furthermore, since the expression is unambiguous, the coefficients of this series are $0$ or $1$. Replacing $x_1$, $x_2$ and $x_3$ by a single variable $t$ yields a rational series in one variable, namely $(t^*t + t)^*t^*$, which is the generating series of the sequence $(|F^n|)_{n \geqslant 0}$, since the coefficient of $t^n$ counts the number of words of $\{x_1, x_2, x_3\}^n$ which are in $F^*$ $$ (t^*t + t)^*t^* = \sum_{n \geqslant 0} |F^n|t^n $$ Since $t^* = 1/(1-t)$, one can compute the Taylor development of this series using $\tt{sagemaths}$:

$> \tt{var('t');} \quad \tt{taylor ((1 / ((1 - t) * (1 - (t * (1 + (1/(1-t))))))), t, 0, 9);}$

$6765 t^9 + 2584 t^8 + 987 t^7 + 377 t^6 + 144 t^5 + 55 t^4 + 21 t^3 + 8 t^2 + 3 t + 1$

Thus $|F| = 3$, $|F^2| = 8$, $|F^3| = 21$, $|F^4| = 55$, etc. One can verify that $|F^{n+1}| = 3|F^n| - |F^{n-1}|$ and that $|F^n| = F_{2n+1}$, where $F_n$ is the $n$-th Fibonacci number.

Definition of the unambiguous operations.

  1. The product $L_1L_2\dotsm L_n$ of $n$ nonempty languages is unambiguous if the decomposition of a word of $L_1L_2\dotsm L_n$ as a product of words of $L_1, L_2, \ldots, L_n$ is unique, that is, if an equality of the form $$ x_1x_2\cdots x_n = y_1y_2\cdots y_n $$ with $x_1, y_1 \in L_1$, $x_2, y_2 \in L_2, \ldots, x_n, y_n \in L_n$ implies $x_1= y_1, x_2 = y_2, \ldots, x_n = y_n$.
  2. The star of a language $L$ is unambiguous if this language is a code, that is, if an equality of the form $$ x_1x_2\cdots x_n = y_1y_2\cdots y_m $$ with $x_1, \ldots, x_n \in L$, $y_1, \ldots, y_m\in L$ implies $n = m$ and $x_1= y_1, x_2 = y_2, \ldots, x_n = y_n$.
  3. The union of $n$ languages is unambiguous if it is a disjoint union.