What is the cardinality of the result of concatenating two languages?

Question

What is the cardinality of the result of concatenating two languages?

1.1k Views Asked by Bumbble Comm At 11 May 2026 - 7:24

Question:

The cardinality of an n-fold Cartesian product upon a language $L$ is simple: $|L|^{n}$. Is there a simple solution to the cardinality of language concatenation?

For example, is there a single formula that can calculate the cardinality of each of these cases for a given $n$?

$\{\varepsilon, a\}^n$
$\{a, b\}^n$
$\{\varepsilon, a, b\}^3\{\varepsilon, d, e, f, g\}^8$

I have spent countless hours searching and cannot find anything telling me yes or no.

Attempts / Research:

By working through examples, I have noticed a few patterns. Let "nfc" = "n-fold cardinality", "cc" = "concatenation cardinality", and "cdr" = "concatenation duplicates removed."

$\{a, b\}\{a, b\}$ = $\{aa, ab, ba, bb\}$ (nfc 4, cc 4, cdr 0)
$\{a, aa\}\{a, aa\} = \{aa, aaa, aaaa\}$ (nfc 4, cc 3, cdr 1)
$\{\varepsilon, a, aa\}\{\varepsilon, a, aa\} = \{\varepsilon, a, aa, aaa, aaaa\}$ (nfc 9, cc 5, cdr 4)
$\{\varepsilon, a, b\}\{\varepsilon, a, b, c\} = \{\varepsilon, a, b, c, aa, ab, ac, ba, bb, bc\}$ (nfc 12, cc 10, cdr 2)
$\{\varepsilon, a, b\}\{c, d, f\} = \{c, d, f, ac, ad, af, bc, bd, bf\}$ (nfc 9, cc 9, cdr 0)

Some observations:

cc + cdr = nfc
When the two languages share no common elements, cc = nfc
When concatenating a language with itself, whether or not the elements are composed of each other makes a big difference in the cc / cdr values.
- Languages containing completely distinct elements have cc = nfc.
- Languages with elements that are concatenations of each other have cc < nfc.

In general the formula seems to follow the format $|L_1| * |L_2| - x$ where $x$ is unknown to me.

Is there a general solution?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I am not sure there is an easy answer for the concatenation product of two finite languages $F$ and $G$, although one could give sufficient conditions to have $|FG| = |F||G|$.

However, the cardinality of the successive powers of a finite language can be computed, thanks to this result:

Theorem. For each finite language $F$, the generating series $\sum_{n \geqslant 0} |F^n| t^n$ is rational. In particular, the sequence $(|F^n|)_{n \geqslant 0}$ satisfies a linear recurrence.

The algorithm is better explained on an example. Let $F = \{a, ab, ba\}$. Observe that since $a(ba = (ab)a$, $|F| = 3$ but $|F^2| = 8$. Here is a way to compute $|F^n|$ for all $n$.

The language $F^*$ is regular and its minimal automaton is the following:

$\hskip 100pt$

By using McNaughton and Yamada's algorithm, one can obtain from this automaton an unambiguous regular expression, namely $$ F^* = (a^*ab + ba)^*a^* $$ The definition of an unambiguous regular expression is given below.

Setting $x_1 = a$, $x_2 = ab$ and $x_3 = ba$, one gets $$ F^* = (x_1^*x_2^{\ } + x_3^{\ })^*x_1^* $$ Thus $F^*$ can be written as a rational series in noncommutative variables $x_1, x_2, x_3$. Furthermore, since the expression is unambiguous, the coefficients of this series are $0$ or $1$. Replacing $x_1$, $x_2$ and $x_3$ by a single variable $t$ yields a rational series in one variable, namely $(t^*t + t)^*t^*$, which is the generating series of the sequence $(|F^n|)_{n \geqslant 0}$, since the coefficient of $t^n$ counts the number of words of $\{x_1, x_2, x_3\}^n$ which are in $F^*$ $$ (t^*t + t)^*t^* = \sum_{n \geqslant 0} |F^n|t^n $$ Since $t^* = 1/(1-t)$, one can compute the Taylor development of this series using $\tt{sagemaths}$:

$> \tt{var('t');} \quad \tt{taylor ((1 / ((1 - t) * (1 - (t * (1 + (1/(1-t))))))), t, 0, 9);}$

$6765 t^9 + 2584 t^8 + 987 t^7 + 377 t^6 + 144 t^5 + 55 t^4 + 21 t^3 + 8 t^2 + 3 t + 1$

Thus $|F| = 3$, $|F^2| = 8$, $|F^3| = 21$, $|F^4| = 55$, etc. One can verify that $|F^{n+1}| = 3|F^n| - |F^{n-1}|$ and that $|F^n| = F_{2n+1}$, where $F_n$ is the $n$-th Fibonacci number.

Definition of the unambiguous operations.

The product $L_1L_2\dotsm L_n$ of $n$ nonempty languages is unambiguous if the decomposition of a word of $L_1L_2\dotsm L_n$ as a product of words of $L_1, L_2, \ldots, L_n$ is unique, that is, if an equality of the form $$ x_1x_2\cdots x_n = y_1y_2\cdots y_n $$ with $x_1, y_1 \in L_1$, $x_2, y_2 \in L_2, \ldots, x_n, y_n \in L_n$ implies $x_1= y_1, x_2 = y_2, \ldots, x_n = y_n$.

The star of a language $L$ is unambiguous if this language is a code, that is, if an equality of the form $$ x_1x_2\cdots x_n = y_1y_2\cdots y_m $$ with $x_1, \ldots, x_n \in L$, $y_1, \ldots, y_m\in L$ implies $n = m$ and $x_1= y_1, x_2 = y_2, \ldots, x_n = y_n$.

The union of $n$ languages is unambiguous if it is a disjoint union.

What is the cardinality of the result of concatenating two languages?

There are 1 best solutions below

Related Questions in COMBINATORICS

Related Questions in FORMAL-LANGUAGES

Related Questions in AUTOMATA

Related Questions in REGULAR-LANGUAGE

Trending Questions

Popular # Hahtags

Popular Questions