What is variable substitution best thought of categorically? A natural transformation?

106 Views Asked by At

Here is an attempted proof in the category $\textbf{Ass}$ where objects are assertions (in a kind of ordered-and or CNF form - essentially a list of assertions) and morphisms are "proofs" between two.

Now I didn't give a detailed proof, but a human-level proof in this example, by just saying that $(\Bbb{Z}, +)$ is an abelian group. Now I want to take the "axiom" $(\Bbb{Z}, +)$ forms an abelian group and transfer it over to a proof that if $x \in \Bbb{Z}$ then $x + x \in \Bbb{Z}$. So to use that axiom, clearly a variable substitution is required.

Visual proof that x + x in ZZ

But, in general, how should we treat variable substitution? Here it looks either like a morphism of two morphisms (forming a commuting square), but I could also imagine this as a natural transformation between two functors (or "diagrams") from the category $\bullet \to *$ into $\textbf{Ass}$. So in general, what approach will become more fruitful?

1

There are 1 best solutions below

3
On BEST ANSWER

A Heyting category is a category which can interpret constructive first-order logic with equality. A Boolean category is a Heyting category whose internal first-order logic is classical.

To do this, a Heyting category must have all finite limits. Furthermore, for each object $A$, the partially ordered set $Sub(A)$ must be a Heyting algebra (a Boolean algebra in the case of a Boolean category). Given an arrow $f : B \to A$, we want the pullback map $f^{-1} : Sub(A) \to Sub(B)$ to be a Heyting algebra homomorphism. We also want $f^{-1}$ to have left adjoint $\exists_f : Sub(B) \to Sub(A)$ and right adjoint $\forall_f : Sub(B) \to Sub(A)$, and we want these to “play nice” by satisfying the Beck-Chevalley condition. All these requirements together are the definition of a Heyting (and Boolean) category.

Heyting categories can naturally interpret constructive many-sorted first-order logic with equality. Consider a multi-sorted first-order vocabulary $V$ and a Heyting category $C$. To interpret $V$ in $C$, we need to assign, for each sort $s$, an object $s^C \in C$. It is helpful to then define, for a list of sorts $(s_1, \ldots, s_n)$, the object $(s_1, \ldots, s_n)^C = s_1^C \times \cdots \times s_n^C$. We also need to assign, for each function symbol $f$ of type $(s_1, \ldots, s_n) \to s$, an arrow $f^C : (s_1, \ldots, s_n)^C \to s_C$. Finally, we must assign to each relation symbol $R$ of type $(s_1, \ldots, s_n)$ a subobject $R^C \in Sub((s_1, \ldots, s_n)^C)$. We call $C$ equipped with this structure a Heyting category $V$-structure.

Note: a special case is when $C$ is the category of sets, which is a Heyting category (and, if you accept excluded middle, a Boolean category). To equip $Set$ with a Heyting category $V$-structure is exactly to produce a $V$-structure in the ordinary sense. Keep this special case in mind as we work our way through the properties of Heyting category $V$-structures; ordinary $V$-structures are a nice special case.

We can then construct terms. A term $t$ of type $s$ in a context $(x_1, \ldots, x^n)$ of type $(s_1, \ldots, s_n)$ is interpreted as an arrow $t^C : (s_1, \ldots, s_n)^C \to s$ using the following recursive definition. We define $x_i^C = \pi_i$, the $i$th projection map. And given a function symbol $f$ of type $(r_1, \ldots, r_m) \to s$ and terms $t_1, \ldots, t_m$ of types $r_1, \ldots, r_m$ respectively, we define $f(t_1, \ldots, t_m)^C = f^C \circ (t_1^C, \ldots, t_m^C)$. Note that formally, the interpretation of a term depends on both the term itself and its context, but we suppress the context in the notation $t^C$.

A proposition $P$ in a context of type $(s_1, \ldots, s_n)$ is interpreted as a subobject $P^C \in Sub((s_1, \ldots, s_n)^C)$ using the following recursive definition. Given terms $t_1, t_2$ of type $s$, we define $(t_1 = t_2)^C$ to be the equaliser of $t_1^C$ and $t_2^C$. Given a relation symbol $R$ of type $(r_1, \ldots, r_m)$ and terms $t_1, \ldots, t_m$ of the appropriate type, we define $R(t_1, \ldots, t_m)^C$ to be $(t_1^C, \ldots, t_m^C)^{-1}(R^C)$. We have $\top^C = \top$, $\bot^C = \bot$, and $(P \square Q)^C = P^C \square Q^C$ for $\square \in \{\land, \lor, \implies\}$ using the Heyting algebra structure of $Sub((s_1, \ldots, s_n)^C)$. And given a proposition $P$ in context $(x_1, \ldots, x_n, y)$ of type $(s_1, \ldots, s_n, s)$, we have $(\square y (P))^C = \square_{(\pi_1, \ldots, \pi^n)} (P^C)$ for $\square \in \{\exists, \forall\}$.

Based on these definitions, we can prove the variable substitution theorem. Suppose we have a proposition $P$ in context $(x_1, \ldots, x_n)$ of type $(s_1, \ldots, s_n)$, and suppose we have terms $t_1, \ldots, t_n$ of the appropriate types, where these terms occur in context $(y_1, \ldots, y_m)$. Then we have $P[x_1 \mapsto t_1, \ldots, x_n \mapsto t_n]^C = (t_1^C, \ldots, t_n^C)^{-1}(P^C)$. Here, $P[x_1 \mapsto t_1, \ldots, x_n \mapsto t_n]$ is the proposition obtained by substituting $t_i$ for $x_i$ in $P$ for all $i$, and it will be a term in context $(y_1, \ldots, y_m)$. This demonstrates that in a Heyting category, substitution is done by pullback.

Given a set $T$ of sentences and a sentence $\phi$, we say $T \vdash \phi$ iff there is some sequence $P_1, \ldots, P_n \in T$ such that we can constructively prove $P_1 \land cdots \land P_n \implies \phi$. We say $T \vdash_{classical} \phi$ if we can do the same thing with a classical proof.

Given a sentence $\phi$ (which is a proposition in the empty context), we say $C \models \phi$ when $\phi^C = \top$. Given a set $T$ of sentences, we say $C \models T$ when $\forall P \in T (C \models P)$. The soundness theorem states that if $T \vdash \phi$ and $C \models T$, then $C \models \phi$. Moreover, if $C$ is Boolean, $C \models T$, and $T \vdash_{classical} \phi$, then $C \models \phi$. Obviously, a special case of this soundness theorem is the soundness theorem for ordinary models.

We also have a completeness theorem (which, unlike the classical completeness theorem for ordinary models, does not require any form of the axiom of choice to prove). Given a theory $T$, we can construct the syntactic category for $T$ as follows.

The objects consist of pairs $((s_1, \ldots, s_n), (x_1, \ldots, x_n) \mapsto P(x_1, \ldots, x_n))$, where $s_1, \ldots, s_n$ are sorts, $(x_1, \ldots, x_n)$ is a context of type $(s_1, \ldots, x_n)$, and $P$ is a proposition in context $x_1, \ldots, x_n$. We consider objects up to renaming of variables. That is, we consider $((s_1, \ldots, s_n), (x_1, \ldots, x_n) \mapsto P(x_1, \ldots, x_n))$ to be equal to $((s_1, \ldots, s_n), (y_1, \ldots, y_n) \mapsto P(y_1, \ldots, y_n))$. So we would not distinguish between, for example, $(s, (x) \mapsto x = x)$ and $(s, (y) \mapsto y = y)$. These would be considered the same object.

We might intuitively think of the object $((s_1, \ldots, s_n), (x_1, \ldots, x_n) \mapsto P(x_1, \ldots, x_n))$ as representing the set $\{(x_1, \ldots, x_n) \in s_1 \times \cdots \times s_n \mid P(x_1, \ldots, x_n)\}$.

The arrows between objects represent definable functions. Formally, consider two objects $((s_1, \ldots, s_n), (x_1, \ldots, x_n) \mapsto P(x_1, \ldots, x_n))$ and $((r_1, \ldots, r_m), (y_1, \ldots, y_m) \mapsto Q(y_1, \ldots, y_m))$, where WLOG the variables $(x_1, \ldots, x_n)$ and $(y_1, \ldots, y_m)$ are distinct. Then an arrow consists of a proposition $(x_1, \ldots, x_n, y_1, \ldots, y_m) \mapsto R(x_1, \ldots, x_n, y_1, \ldots, y_m)$ such that

$$T \vdash \forall x_1 \cdots \forall x_n (P(x_1, \ldots, x_n) \implies \exists y_1 \cdots \exists y_m (Q(y_1, \ldots, y_m) \land R(x_1, \ldots, x_n, y_1, \ldots, y_m) \land \forall z_1 \cdots \forall z_m (Q(z_1, \ldots, z_m) \land R(x_1, \ldots, x_n, z_1, \ldots, z_m) \implies z_1 = y_1 \land \cdots \land z_m = y_m)))$$

and we consider $R_1$ and $R_2$ to be the same arrow when we can prove

$$T \vdash \forall x_1 \cdots \forall x_n \forall y_1 \cdots \forall y_m (P(x_1, \ldots, x_n) \land Q(y_1, \ldots, y_m) \land R_1(x_1, \ldots, x_n, y_1, \ldots, y_m) \implies R_2(x_1, \ldots, x_n, y_1, \ldots, y_n))$$

Again, we also consider the $R$s up to variable renaming.

Thus, if we continue out set analogy, $R$ defines a function $\{(x_1, \ldots, x_n) \in s_1 \times \cdots \times s_n mid P(x_1, \ldots, x_n)\} \to \{(y_1, \ldots, y_m) \mid Q(y_1, \ldots, y_m)\}$, and we consider two $R$s to be equal iff they define the same function.

This category is a Heyting category. If we replace $\vdash$ with $\vdash_{classical}$, it is a Boolean category.

Moreover, there is a canonical $V$-structure in the syntactic category $C$. A sort $s$ corresponds to $s^C = (s, (x) \mapsto \top)$. Then we have $(s_1, \ldots, s_n)^C = (s_1, \ldots, s_n, (x_1, \ldots, x_n) \mapsto \top)$. Given function symbol $f$ of type $(s_1, \ldots, s_n) \to s$, we have $f^C = (x_1, \ldots, x_n, y) \mapsto (y = f(x_1, \ldots, x_n))$. And given a relation symbol $P$ of type $(s_1, \ldots, s_n)$, we have $P^C = (s_1, \ldots, s_n, (x_1, \ldots, x_n) \mapsto P(x_1, \ldots, x_n))$, with the obvious inclusion $i = (x_1, \ldots, x_n, y_1, \ldots, y_n) \mapsto x_1 = y_1 \land \cdots \land x_n = y_n$.

It follows by induction that for any term $t$ of type $s$ in context $(x_1, \ldots, x_n)$ of type $(s_1, \ldots, s_n)$, we have $t^C = (x_1, \ldots, x_n, y) \mapsto (y = t)$. And it follows that given a proposition $P$ in context $(x_1, \ldots, x_n)$ of type $(s_1, \ldots, s_n)$, we have $P^C = ((s_1, \ldots, s_n), (x_1, \ldots, x_n) \mapsto P)$.

Then we see that $C \models T$. In fact, for all $\phi$, we see that $C \models \phi$ if and only if $T \vdash \phi$. If we instead constructed $C$ using $\vdash_{classical}$, we have $C \models \phi$ if and only if $T \vdash_{classical} \phi$.

We can view $C$ as your "category of assumptions".