Recursive Definition of Normal Form with explicit substitution

213 Views Asked by At

Context

I assume a simply typed lambda calculus, probably written with de-bruijn indexes. With $\to_\beta$ I denote the $\beta$-reduction as a relation.

Also, my question eventually will use this $\lambda$-calculus with explicit substitution (originally called $\lambda\sigma$-calculus), where substitution is introduced on term level via closures $M[s]$ with $\lambda$-term M and substitution $s$.

A few definitions are needed.

Definition: Normal Form

A $\lambda$-term $M$ is in normal form, if there is no $\lambda$-term $N$ such that $M\to_\beta N$.

Besides this defintion, there is also another definition I'm ultimately interested in: defining normal form recursively:

Definition II: Neutral and Normal Form

Mutually defining neutral and normal forms:

Neutral Form:

  • Every variable is in neutral form.
  • if $M$ is in neutral form and $N$ in normal form then $(M N)$ is in neutral form.

Normal Form:

  • If $M$ is in neutral form, then $M$ is in normal form.
  • If $M$ is in normal form, then $\lambda M$ is in normal form.

Question

So far, this deals with the classical simply typed lambda calculus, where substitution is defined as a (meta-) function over terms. First: does this definition of normal forms make sense?

Then: Putting substitution into the term level, the question arises how to define normal forms then recursively as above?

Own Ideas

My first go to was to add the following case to the definition of a normal form:

  • If every term in $s$ is in neutral form and $t$ is in normal form, then $t[s]$ is in normal form.

Since neutral forms don't allow abstractions, even when dealing with closures like $(x y)[s]$, this should not introduce new $\beta$-redexes. But somehow, I'm unsure. Does this still resemble what NF means? After searching for information, there seems to be nothing like this.

2

There are 2 best solutions below

2
On BEST ANSWER

Does the recursive definition make sense?

I'm guessing, this question in more formal terms would be "Are the two definitions equivalent?". To answer this, you can try to show that for every term $t$,

  1. if it has no more redexes, it is in II-normal form (short for "normal form by definition II")
  2. if it has at least one redex, then it is not in II-normal form* and it's also not neutral.

There we go:

  • If $t$ is a variable, it contains no more redexes and therefore we need to show (1). Trivially, a variable is in II-normal form.
  • If $t$ is a lambda $\lambda M$, then
    • if $M$ has no more redexes, then $\lambda M$ also has no more redexes, therefore we need to show (1). By the inductive hypothesis, $M$ is in II-normal form. Then by definition, $\lambda M$ is also in II-normal form.
    • if $M$ contains at least one more redex, then $\lambda M$ also contains at least one and therefore we need to show (2). We easily see that $\lambda M$ is not neutral. It remains to show that it is also not II-normal. According to the definition, $\lambda M$ can be in II-normal form if a) $M$ is also in II-normal form or b) if $\lambda M$ is neutral. However, regarding (a), by the induction hypothesis $M$ is not II-normal. Regarding (b), again $\lambda M$ can't be neutral because by definition lambda terms are not neutral. Point (2) has been shown.
  • If $t$ is an application $M\,N$, we separate cases again:
    • If $M$ and $N$ contain no more redexes, it is still possible for $M\,N$ to be a redex. It is a redex, if $M$ is a lambda term $\lambda M'$ and, therefore, $t$ is of the form $\lambda M'\, N$. We need to show that such a term is not in II-normal form. We will assume the opposite and arrive at a contradiction. Since $t$ is an application, the only way it can be in II-normal form is if it is neutral and, by definition, $\lambda M'\,N$ is neutral if $\lambda M'$ is neutral and $N$ is II-normal. However, lambdas are not neutral. We arrived at a contradiction. Therefore, $t = \lambda M'\, N$ is not in II-normal form.
    • If $N$ or $M$ contain at least one redex, then $M\,N$ also does, so we need to show (2). Again we will assume $M\,N$ is in II-normal form and reach a contradiction. Since $M\,N$ is in normal form, it must be the case that it is also neutral (by definition). This means that $M$ must be neutral and $N$ must be II-normal. From the induction hypothesis, if $N$ has redexes, then it is not in II-normal form and if $M$ has redexes then it is not neutral. Contradiction, hence $M\,N$ is not in II-normal form.

This proves that Definition II is equivalent to "is $\beta$-irreducible".

How to adapt Definition II to also account for substitution terms $t[s]$?

By the first definition, it is clear that $t[s]$ is in normal form when both $t$ and the term(s) in $s$ are. You need to append the above proof with one more inductive case for $t = M[N]$. How would you do that? How would you change Definition II so that the proof still holds?


*This is the contrapositive of saying, "if it is in II-normal form, it has no more redexes"

0
On

Different recursive definitions - cast in the guise of abstract machines - abound, such as the Krivine Machine or CEK Machine, which are suitable for normal order reduction, or the SECD Machine, I believe, for applicative order reduction.

You forgot a key definition: head-normal form. All pure λ-terms are either of the form $xY⋯Z$ headed by a variable $x$, where $Y$, ..., $Z$ are other λ-terms, or are a λ-abstraction $λx·A$, where $A$ is some other λ-term, or else are headed by a β-redex: $(λx·A)BY⋯Z$. The first two are in head-normal form. In the first case, normal form is reached when $Y$, ..., $Z$ are each reduced to normal form. In the second case, the same holds true when $A$ is reduced to normal form, with the additional qualifier that an η-reduction $λx·Ax → A$ also occurs if $x ∉ ∂A$, where $∂A$ will be used here and below to denote the set of variables that occur freely in a λ-term $A$. So, we can define $(Ax)/x = A$, if $x ∉ ∂A$, and $A/x = λx·A$ otherwise.

Only in the last case do you not have a head-normal form, with the reduction $$(λx·A)BY⋯Z → (x=B,A)Y⋯Z,$$ where $x=B,A$ denotes the result of substituting all free occurrences of the variable $x$ in the λ-term $A$ by the λ-term $B$. Recursively, denoting the normal form of a λ-term $A$ by $[A]$, we have: $$ [xY⋯Z] = x[Y]⋯[Z],\quad [λx·A] = [A]/x,\quad [(λx·A)BY⋯Z] = [(x=B,A)Y⋯Z], $$ where $$(Ax)/x = A\quad (x ∉ ∂A),\quad A/x = λx·A\quad\mbox{else}.$$ If not using the η-rule, then $A/x = λx·A$ unconditionally.

For impure λ-terms, being headed by a constant $c$ or function $f$, with fewer terms following it than its arity, also count as head-normal. Otherwise, if a function $f$ has arity $n$ and is followed by at least $n$ sub-terms in the term $fA_0⋯A_{n-1}Y⋯Z$, then it is a $δ-redex$ and the $δ-rule$ applies to it, with the reduction $$fA_0⋯A_{n-1}Y⋯Z → f\left(A_0,⋯,A_{n-1}\right)Y⋯Z.$$

For substitution, the following equations can be used $$ (x=A,x) = A,\quad (x=B,A) = A\quad (x ∉ ∂A),\\ (x=B,λy·A) = λy·(x=B,A)\quad (x ∈ ∂λy·A,\quad y ∉ ∂B), $$ with name conflicts handled by the α-rule, which can be stated in this context in the following form $$[(λx·A)Y⋯Z] = [(λy·(x=y,A))Y⋯Z]\quad (y ∉ ∂A).$$ This works fine even in the trivial case $y = x$, since $(x = x, A) = A$ already follows, by inductive argument over the structure of terms $A$, by the previous clauses in the definition for substitution, regardless of whether $x ∈ ∂A$ or $x ∉ ∂A$. Finally, for the free variable sets, we have the following inductive definition $$∂x = \{x\},\quad ∂AB = ∂A ∪ ∂B,\quad ∂λx·A = ∂A \setminus \{x\}.$$

Example: Let $K = λx·λy·x$, $Ω = (λx·xx)(λx·xx)$. Then, it follows that $K$ is in normal form, since $$\begin{align} [K] &= [λx·λy·x]\\ &= [λy·x]/x\\ &= ([x]/y)/x\\ &= (x/y)/x\\ &= (λy·x)/x\\ &= λx·λy·x\\ &= K \end{align}$$

On the other hand $Ω$ has no normal form, since it is a β-redex that has the reduction $Ω → Ω$.

Also, let $α = (y = Ω, x = K, z)$ and $β = (y = Ω, x = K, xy)$. Then $$\begin{align} α &= (y = Ω, x = K, z)\\ &= (y = Ω, z)\\ &= z. \end{align}$$ Thus $$\begin{align} [K α β] &= [(λx·λy·x) α β]\\ &= [(x=α,λy·x) β]\\ &= [λy·(x=α,x) β]\\ &= [y=β,x=α,x]\\ &= [y=β,α]\\ &= [y=β,z]\\ &= [z]\\ &= z. \end{align}$$

Similarly, for the substitution $γ = (y = Ω, x = K, (x z)(y z))$, we have $$\begin{align} γ &= (y = Ω, x = K, x) α β\\ &= (y = Ω, K) α β\\ &= K α β. \end{align}$$ Therefore, $$[γ] = [K α β] = z.$$

Example: Expanding on the previous example, define $S = λx·λy·λz·xz(yz)$. Then $$\begin{align} [S K Ω] &= [(λx·λy·λz·xz(yz)) K Ω]\\ &= [(x=K,λy·λz·xz(yz)) Ω]\\ &= [λy·(x=K,λz·xz(yz)) Ω]\\ &= [y=Ω,x=K,λz·xz(yz)]\\ &= [λz·γ]\\ &= [γ]/z\\ &= z/z\\ &= λz·z. \end{align}$$

All of this captures Normal Order reduction, as opposed to Applicative Order reduction, the distinction between the two described here: Normal Order versus Applicative Order.

Since substitution sequences invariably arise, it's natural to try and generalize it with "environments" ( = substitution sequences) in place of substitutions. For normal order reduction, it's not really worth the trouble, though. Instead, it's more fruitful to just treat $λx·A = A/x$ generically, and generalize $A/x$: $$ x/x = I,\quad a/x = K a,\quad (x x)/x = D,\\ (a x)/x = a,\quad (x b)/x = T b,\quad (u x)/x = W (u/x),\quad (x v)/x = U (v/x),\\ (a v)/x = B a (v/x),\quad (u b)/x = C (u/x) a,\quad (u v)/x = S (u/x) (v/x), $$ where $$x ∉ ∂a,\quad x ∉ ∂b,\quad x ∈ ∂u\quad (u ≠ x),\quad x ∈ ∂v\quad (v ≠ x).$$ This is to be applied from bottom up as a way to compile out all of the λ-abstractions. It introduces several combinator functions with the redexes and reduction rules $$ I x → x,\quad K x y → x,\quad D x → x x,\\ T x y → y x,\quad W x y → x y y,\quad U x y → y (x y),\\ B x y z → x (y z),\quad C x y z → x z y,\quad S x y z → x z (y z). $$ Then, the rules simplify to $$ [xY⋯Z] = x[Y]⋯[Z],\quad [RY⋯Z] = [CY⋯Z]\quad (R → C),\quad [cY⋯Z] = c[Y]⋯[Z], $$ where $x$ denotes a variable, $Y$, ..., $Z$ λ-terms (now compiled into terms involving combinators), $R$ any of the redexes listed above with the corresponding reduction $R → C$, and $c$ any combinator followed by fewer terms $Y$, ..., $Z$ than its arity.

That's essentially the reduction method used by Combo, which I put up on GitHub, except it optimally shares subterms and never reduces any term more than once, and it can also apply the η-rule with the extra clause $[A] = [Az]/z$ for a term $A$ in normal form, not already known to be in normal form relative to the η-rule, where $z$ denotes a fresh variable. In particular, this handles all the cases of head-normal form terms that are headed by combinators that are short on arity; e.g. $$[SKx] = [SKxz]/z = [Kz(xz)]/z = [z]/z = z/z = I.$$

For Applicative Order, the definition listed by Werner Kluge in "The Organization of Reduction, Data Flow, and Control Flow Systems" (1992), on p. 117 seems OK. Up to change in notation (assuming I haven't lost anything in translation), it's just: $$ [xY⋯Z] = x[Y]⋯[Z],\quad [λx·A] = λx·[A],\quad [(λx·A)BY⋯Z] = [(x=[B],A)[Y]⋯[Z]]. $$ Since substitution environments now contain only normal order terms, then it makes more sense to formalize substitution sequences as environments.

An environment can, now, be either the empty environment $I$ or an updated environment $E,x←A$, where $E$ is an environment $x$ a variable and $A$ a λ-term. Its action, as a substitution function on variables, is defined inductively by $$I(x) = x,\quad (E,x←A)(x) = A,\quad (E,x←A)(y) = E(y)\quad (y ≠ x).$$

Then, the following should capture the essence of Kluge's definition: $$\begin{align} S_0:& [A] = [I, A],\\ S_1:& [E,x] = E(x),\\ S_2:& [E,λx·A] = [(E,x←y),A]/y,\\ S_{3a}:& [E,(λx·A)B] = [(E,x←[E,B]),A],\\ S_{3b}:& [E,CB] = \{[E,C],[E,B]\},\\ S_{4a}:& \{ λx·A, B \} = [(I,x←B),A],\\ S_{4b}:& \{ C, B \} = C B, \end{align}$$ where in $S_2$, $y$ is a "fresh" variable - i.e. a variable $y ≠ x$ otherwise unused in $E$, or $A$; and where $S_{3b}$ and $S_{4b}$, the term $C$ is not a λ-abstraction. The inclusion of the η-rule in $S_2$ is an addition that was not part of Kluge's original.

This can be mechanized by maintaining the following assertions: $$s(E,A,D) = e([E,A],D),\quad t(E,A,D) = f(E,[E,A],D),$$ the second set of functions used to preserve an environment $E$ so as to avoid duplicating it as much as possible in $S_{3a}$ and $S_{3b}$. To this end, it is worth pointing out that $$E,x←A,x←E(x) = E,$$ reverses the effect of the update $E,x←A$.

The "dump" $D$ is used as a stack to handle recursion and holds the least that is necessary to maintain the assertions.

The assertion can be met with the following rule for initialization and finalization $$[A] = s(I,A,[]),\quad e(N,[]) = N,$$ where $[]$ denotes the "empty" dump.

The $S$ rules can be handled with the following definitions for the $s$ and $t$ functions: $$\begin{align} s(E,x,D) &= e(E(x),D),& t(E,x,D) &= f(E,E(x),D),\\ s(E,λx·A,D) &= s((E,x←y),A,1yD),& t(E,λx·A,D) &= t((E,x←y),A,2xE(x)1yD),\\ s(E,AB,D) &= t(E,B,3AD),& t(E,AB,D) &= t(E,B,4AD), \end{align}$$ where $y$ denotes a "fresh" variable and the stacking onto the dump $D$ is denoted by concatenation on the left.

For the $f$ function we then define: $$\begin{align} f(E,A,1yD) &= f(E,A/y,D),\\ f(E,A,2xND) &= f((E,x←N),A,D),\\ f(E,A,3(λx·B)D) &= s((E,x←A),B,D),\\ f(E,A,4(λx·B)D) &= t((E,x←A),B,2xE(x)D),\\ f(E,A,3CD) &= s(E,C,5AD),\\ f(E,A,4CD) &= t(E,C,5AD),\\ f(E,λx·A,5ND) &= s((I,x←N),A,6ED),\\ f(E,C,5ND) &= f(E,CN,D), \end{align}$$ where the (normal form) λ-term $C$ is not a λ-abstraction, and $z$ is a "fresh" variable.

For the $e$ function, we define: $$\begin{align} e(A,1yD) &= e(A/y,D),\\ e(λx·A,5ND) &= s((I,x←N),A,D),\\ e(C,5ND) &= e(CN,D),\\ e(N,6ED) &= f(E,N,D), \end{align}$$ where the term $C$ (again, already in normal form) is not a λ-abstraction

Only in the cases $f(E,A,5ND)$ and $e(N,6ED)$ are environments saved and restored. This corresponds to "breakpoints" in call-frames. Together, this should be enough to maintain the inductive assertions for the $s$ and $e$ functions and for the $f$ and $t$ functions, with respect to the $S$ rules, though the η-rule may have to be more carefully checked since I just improvised that in.