Differential in lambda calculus

1.4k Views Asked by At

Introduction and Notation

I'm experimenting with $\lambda$-calculus and I'm wondering about the exact semantics of common operators in calculus, specifically the differential operator.
Suppose we augment lambda calculus to formalize variable substitution for the definition of the differential.

$$ \mathrm{d} = (\lambda \space x \space y. \sum_{i=1}^{arity(x)} \partial_ix \bigoplus y \space \mathrm{d} \space y_i) $$

Here $ y_i $ is an entry in the tuple $ y $, assuming $ x = function $ and $ y = (arg_1, arg_2, ...) $. The operator $\bigoplus$ concatenates the elements from a tuple such that they're in the same form in which lambda functions take them. $$ f \bigoplus (y_1, y_2, y_3) = f\space y_1 \space y_2 \space y_3 $$

Scalar Case

Assuming $ \sin $ is a lambda function of arity 1.
The following application exemplifies usage:

$$ \mathrm{d} \space \sin \space x = (\lambda \space x \space y. \sum_{i=1}^{arity(x)} \partial_ix \bigoplus y \space \mathrm{d} \space y_i) \sin \space x $$ $$ \rightarrow_\beta \mathrm{d} \space \sin \space x = (\lambda \space y. \sum_{i=1}^1 \partial_i \sin \bigoplus y \space \mathrm{d} \space y_i) \space x $$ $$ \rightarrow_\beta \mathrm{d} \space \sin \space x = (\lambda \space y. \cos \space y \space \mathrm{d} \space y) \space x $$ $$ \rightarrow_\beta \mathrm{d} \space \sin \space x = \cos \space x \space \mathrm{d} \space x $$

As expected.

Scalar to Matrix Case

Assume $ R : \mathbb{R} \rightarrow \mathbb{R}^{2\times 2} $: a function that returns a matrix.

$$ R = \left(\lambda \space x . \begin{bmatrix} \cos x & -\sin x \\ \sin x & \cos x \end{bmatrix}\right) $$ $$ \mathrm{d} \space R \space x = (\lambda \space x \space y. \sum_{i=1}^{arity(x)} \partial_ix \bigoplus y \space \mathrm{d} \space y_i) R \space x $$ $$ \rightarrow_\beta \mathrm{d} \space R \space x = \partial_1 R \space x \space \mathrm{d} \space x $$

Which is fine since we can simply state that

$$ \partial_1 R = \left(\lambda \space x . \begin{bmatrix} -\sin x & -\cos x \\ \cos x & -\sin x \end{bmatrix}\right) $$

Vector to Matrix Case

Assume $ R : \mathbb{R}^2 \rightarrow \mathbb{R}^{2\times 2} $: a function that returns a matrix.

$$ R = \left(\lambda \space x \space y . \begin{bmatrix} \cos (x+y) & -\sin (x+y) \\ \sin (x+y) & \cos (x+y) \end{bmatrix}\right) $$ $$ \mathrm{d} \space R \space (x, y) = (\lambda \space x \space y. \sum_{i=1}^{arity(x)} \partial_ix \bigoplus y \space \mathrm{d} \space y_i) R \space (x, y) $$ $$ \rightarrow_\beta \mathrm{d} \space R \space x = \partial_1 R \space x \space y \space \mathrm{d} \space x + \partial_2 R \space x \space y \space \mathrm{d} \space y $$

Which should also be well-defined, however, I don't get a matrix like I would with the expression $ \frac{\partial R}{\partial X} $. Why is that? Why does it not become a rank 3 tensor?

Actual problem

The problem I'm having is when I try to derive with respect to a vector. I think the problem stems in part from my confusion on the definition of the $ \partial $ operator.

On wikipedia it is written that $$ y: \mathbb{R}^m \rightarrow \mathbb{R}^n \wedge x \in \mathbb{R}^m $$ $$ \frac{\partial y}{\partial x} = \left[\frac{\partial y_j}{\partial x_i}\right] $$ Where $ i = 1..m $ is the row and $ j = 1..n $ being the column of the matrix.
Wouldn't it be semantically appropriate to instead write this as:

$$ T(\nabla \space T \space y) $$ where $ T $ is the transpose operator and $$ \nabla = \left(\lambda x . \begin{bmatrix}\partial_1 x \\ \vdots \\ \partial_{arity(x)} x\end{bmatrix}\right) $$

Question

What is the definition of $ \partial $? It makes sense to me that we can write $ \frac{\mathrm{d}}{\mathrm{d} x} $ because the division makes sense on both sides of the equation, but I don't see it for $ \partial $ in $ \frac{\partial}{\partial x} $. All I see is an operator with respect to a certain variable in a function (hence I write $ \partial_i $). This is where my confusion comes from in matrix calculus.
In addition to that, what would the truly general $ \mathrm{d} $ be defined as for all mathematical objects (vector-valued functions, multi-variable functions, and matrix-returning functions)?