Model BNF Formally in mathematical terms

115 Views Asked by At

Usually, a language is formally defined as some subset $L$ of the Kleene-closure $\Sigma^*$ of some "alphabet" (i.e. finite set) $\Sigma$, where by Kleene closure we mean the infinite union $$\Sigma^* = \bigcup_{k=0}^\infty \Sigma^k$$ where $\Sigma^0 := \{\epsilon\}$, and $\Sigma^k$ for $k\geqslant 1$ denotes the Cartesian product of sets. In view of this definition, it is more convenient to model strings of length $k$ (i.e. $k$-tuples) here as functions $f\colon\{1,\dots,k\}\to \Sigma$ so that $(a_1, \dots, a_n)$ becomes $(f(1), \dots, f(n))$ as opposed to the usual definition with nesting of pairs (this avoids discomfort about tuples of length zero or one, so that $\epsilon$ is just $f\colon\varnothing\to\varnothing$, and also tuples of length one are distinguished structurally as functions and cannot be confused with members of the alphabet; i.e. $\Sigma^1 \neq \Sigma$).

Coming from a mathematical background to the subject of formal languages, I am used to the comfortable notion that the machinery underlying everything is built on the ZF(C) axioms. So far, these definitions are satisfactory and nicely describe the idea of languages in a mathematical framework.

But then when Backus–Naur form is used, the mathematical rigour starts slacking a bit, I can't seem to find any books/resources which formally define how BNF's construct languages as sets. I'd appreciate some guidance as to how BNFs can be studied in a formal setting which continues in the spirit of the definitions above.

1

There are 1 best solutions below

0
On

For simplicity, assume each non-terminal occurs exactly once as the left hand side of a BNF rule. (This just means that all the rules are joined together with alternation.) Let $N$ be the set of non-terminals and identify the terminals with the alphabet $\Sigma$. Define a set function $g:\mathcal P(\Sigma^*)^N\to\mathcal P(\Sigma^*)^N$ by setting $g(S)(n)$ to the following interpretation of the right hand side of the rule for the non-terminal, $n$. Interpret each terminal (i.e. letter) as a singleton set of that letter. Interpret juxtaposition as language concatenation. Interpret alternation as union. Finally, interpret the non-terminal $m$ via as $S(m)$. Let $s\in N$ be the starting non-terminal. The language recognized by the BNF is the least (pre-)fixed point of $g$ evaluated at $s$ where the ordering is the pointwise subset ordering.

This is just the straightforward set-theoretic rendition of a collection of mutually inductively defined sets. Sure enough, if we use the BNF: $s ::= 0 \mid 1\ s$ with alphabet $\{0,1\}$, the least (pre-)fixed point property of the resulting language is essentially (the set-theoretic variant of) mathematical induction (of natural numbers). Call the resulting language $L$. The least pre-fixed point property states that for any set $S$ such that $\{0\}\cup(\{1\}\frown S)\subseteq S$ where $\frown$ is language concatenation, $L\subseteq S$.