Suppose we have unambiguous context-free language, how can we calculate the number of words that can be represented by $n$ terminals?

48 Views Asked by At

I am new to formal language theory, apology if this question seems obvious. Here are some definitions from formal language theory:

Definition. Let $\mathcal{V}$ be a set of variables (usually denoted by upper case letters), and $\mathcal{T}$ a set of terminals (usually denoted by lowercase letters). A context-free grammar consists of a finite set of production rules of the form $$ V \rightarrow w_1\left|w_2\right| \cdots \mid w_n $$ where $V \in \mathcal{V}$, each $w_i \in(\mathcal{V} \cup \mathcal{T})^*$, and the | symbol stands for exclusive 'or'. We nominate one variable to be the starting variable.

A context-free grammar produces a language in the following way. Start at the nominated starting variable, and perform substitutions according to the production rules, until the word consists only of terminals.

Using DSV method we can convert the grammar into a system of equations by replacing:

  • the empty word $\epsilon$ with the integer 1 ,
  • each terminal with the formal variable $z$,
  • each variable $V$ with a function $V(z)$,
  • the or | with addition + ,
  • concatenation with multiplication,
  • the production arrow with $=$.

I am trying to understand the following example.

Suppose we have terminals $\{ a,a^{-1}, t, t^{-1} \}$.

Here are some grammar: $$ \begin{aligned} & S \rightarrow \epsilon|A| T, \quad A \rightarrow a^{-3}\left|a^{-1}\right| a \mid a^3, \\ & T \rightarrow a t^2 U t^{-2}\left|a^{-1} t^2 U t^{-2}\right| a t a^2 t^{-1}\left|a t a^3 t^{-1}\right| a^{-1} t a^2 t^{-1} \mid a^{-1} t a^3 t^{-1}, \\ & U \rightarrow t U t^{-1}|T| a^{-3}\left|a^{-2}\right| a^2 \mid a^3 \end{aligned} $$ The grammar becomes the following system of equations. $$ \begin{aligned} & S(z)=1+A(z)+T(z), \quad A(z)=2\left(z+z^3\right), \\ & T(z)=2 z^5 U(z)+2 z^5+2 z^6, \quad U(z)=z^2 U(z)+T(z)+2\left(z^2+z^3\right) . \end{aligned} $$ Solving these yields the following rational expression: $$ S(z)=\frac{1+2 z-z^2-2 z^5-2 z^6+2 z^7-2 z^8}{1-z^2-2 z^5} . $$

Question: How do we calculate the number of words that can be represented by $n$ terminals using the equation? Is it related to the root of the equation?

Any insight would be really appreciated.

1

There are 1 best solutions below

0
On

The function $S(z)$ is a generating function, which means that its coefficients when expanded into a power series are the answer to your question. That is, when you write $$ S(z) = \sum_{n=0}^\infty s_nz^n, $$ then $s_n$ is the number of words with $n$ terminals.

You can extract $s_n$ from $S(z)$ by repeatedly differentiating, then substituting 0 for $z$, then dividing by $n!$. $$ s_n= \frac1{n!}S^{(n)}(0). $$ There is also the Cauchy integral formula which allows you to recover $s_n$: $$ s_n=\frac{n!}{2\pi i}\oint_C \frac{S(z)}{z^{n+1}}\,dz $$ Here, $C$ is a small circle in the complex plane centered at the origin, oriented counterclockwise, which is small enough such that none of the singularities of $S(z)$ are in the interior of $C$.

Occasionally, if $S(z)$ is nice enough, you can get a closed expression for $s_n$. For example, with $k$ a positive integer, let $$ F(z)= \frac1{(r-z)^k}\tag1 $$ Then, writing $F(z)=\sum f_n z^n$, you can show $$ f_n= \binom{n+k-1}{k-1}r^{-(n+k)}\tag2 $$ In your case, you can find the partial fraction decomposition of $S(z)$, which will write $S(z)$ as a linear combination of functions like in $(1)$, and then you get a formula for $s_n$ as the corresponding linear combination of terms like in $(2)$. However, this requires finding the roots of the denominator, $1-z^2-2z^5$, of $S(z)$, so this closed form may involve powers of irrational or complex numbers.