Are programming languages generally CFLs or deterministic CFLs?

Question

Are programming languages generally CFLs or deterministic CFLs?

44 Views Asked by Bumbble Comm At 28 Mar 2026 - 4:56

In compiler books for programming languages, such as the dragon book by Ullman, I often see context free languages and LR parsers are mentioned.

LR parsers are for parsing deterministic CFLs, while CFLs are nondeterministic.

Are programming languages generally CFLs or deterministic CFLs?

Or what aspects of programming languages are modeled as CFLs, and what aspects are as deterministic CFLs?

Thanks.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-06-14 20:51:32

All practical programming languages are deterministic, even Perl. They have to be deterministic because every valid program (in a programming language) must have a single interpretation. Otherwise, the programmer cannot know what program they are writing, which is hard to fit into the concept of "practical". However, that determinism might not be reflected in any formal presentation of the language's grammar. It might be the case that the only 100% accurate description of the precise syntactic analysis for a given construct is the code in the compiler which does that analysis.

Most practical programming languages come with documentation of their syntax as a context-free grammar. However, in almost all (or maybe all) cases, that grammar does not precisely describe the programming language. Rather, it describes a superset of the language which contains, but is not limited to, valid programs. The approximate grammar might allow invalid programs, or it might permit incorrect syntactic analysis of valid programs, because writing a precise description within the constraints of a context-free grammar is impossible or impractical. Instead, additional constraints and disambiguation rules are typically communicated in some human language and implemented in some computer program (a compiler, for example). The description and the implementation do not always correspond, unfortunately, but this is the real world and not a Platonic ideal, so imperfections are to be expected.

Behind all that theory is the fact that most (almost all) programming languages are not context-free. They are commonly described as "almost context-free" -- which is not a meaningful concept, since context-freeness is not a fractional property -- which basically means that "not too much" of the language description is written only in a human-readable language. There are many different aspects of languages which cannot be expressed as a CFG; a partial catalogue might include:

Macro or template facilities, which effectively extend the grammar of the language on the fly. The C preprocessor, variants of which are applied to a great number of languages, not all of them strictly in the "C family", is particularly undisciplined, since its effects cannot be formally described either syntactically or semantically. Generic templates, dependent types and hygienic macros are more disciplined, but still context-sensitive (at least).
Name lookup rules which require coordination between use and declaration of identifiers. Such rules are context-sensitive by definition, since they require particular substrings in valid programs to match each other; thus, the same string (the name of a variable) will be parsed differently in a context in which a declaration of that variable is present from a context in which a declaration of that variable is absent. It's tempting to dismiss these rules as "semantic", but that's clearly not the case; variable scoping rules, for example, can be precisely specified at compile time without any reference to program execution. In many languages, the correct parse for a program cannot be determined without knowing the grammatical category of a binding; the most common example is that names of types and names of variables must be used in different ways.
Visual nesting rules, such as those used by Python and Haskell where block structure is shown by indentation, also require comparison of distinct subsequences (in this case, the number of space characters at the beginning of two or more lines).
Contextual disambiguation rules. Perhaps the most infamous such rule is Perl's indirect object call syntax, where the disambiguation between two possible parses may depend on whether or not particular symbols are present in the module symbol table when the expression is parsed. But C++ -- whose name resolution rules are extraordinarily complicated, although deterministic -- also includes cases where the same sequence of tokens will be parsed in different ways depending on how a particular template is expanded.

Are programming languages generally CFLs or deterministic CFLs?

There are 1 best solutions below

Related Questions in COMPUTER-SCIENCE

Related Questions in FORMAL-LANGUAGES

Trending Questions

Popular # Hahtags

Popular Questions