How many different forms of the chain rule are there?

248 Views Asked by At

The only examples I can think of so far are:
1. If $y=[f(x)]^n$, then $\frac{dy}{dx}=n[f(x)]^{n-1}f'(x)$
2. If $y=f[g(x)]$, then $\frac{dy}{dx}=f'[g(x)]g'(x)$
3. $\frac{dy}{dx}=\frac{dy}{dz}\cdot\frac{dz}{dx}$
4. $\frac{dy}{dx}=\frac{1}{\frac{dx}{dy}}$

Am I missing any, or do these 4 equivalent statements form the basis of the chain rule?

2

There are 2 best solutions below

0
On

1 is not the chain rule; it's a particular application of the chain rule. 2 and 3 are the classic presentations of the chain rule. 4 is something else entirely, I think.

0
On

You can get other forms of the chain rule, when you start to deal with functions of multiple variables (possibly to multiple variables as well!). For example, if you have a functions $f, u, v : \mathbb{R}^2 \rightarrow \mathbb{R}$, and consider $g(x) = f(u(x, y), v(x, y))$, you get another chain rule: $$\frac{\partial g}{\partial x} = \frac{\partial f}{\partial u} \frac{\partial u}{\partial x} + \frac{\partial f}{\partial v} \frac{\partial v}{\partial x},$$ where $\frac{\partial g}{\partial x}$ is the partial derivative of $g$ with respect to $x$ (if you haven't seen this yet, don't worry - I'm just showing you that there are more chain rules out there).

This is not a comprehensive list! You can continue adding more variables (or attempt to write it out in general). But then you can go outside of finite dimensions and consider derivative generalisations such as the Frechet derivative on Banach Spaces, in which there's another chain rule.

There is, however, a simple idea that connects all of the chain rules; one that is often missed in the landslide of notation. To understand it, you need to understand derivatives from potentially a different angle, but you'll also want some knowledge of Linear Algebra.

The idea behind derivatives is linearisation: that given a function $f : X \rightarrow Y$ and a point $x_0 \in X$, the function $f$ is locally "close" to another function which generalises the idea of a tangent.

When $X$ and $Y$ are $\mathbb{R}$, the tangent function takes the form $L(x) = mx + b$, i.e. a straight line. We interpret $m$, the slope of this line, to be the derivative at the given point.

More generally, we look for a function $L : X \rightarrow Y$, which has the form $L(x) = M(x) + b$, where $M$ is a "linear" map between $X$ and $Y$ (this has a specific meaning - one that you'll encounter when studying Linear Algebra), and $b \in Y$. Again, we interpret this linear map $M$ to be the derivative at the given point. When $X = \mathbb{R}^m$ and $Y = \mathbb{R}^n$, this linear map $M$ can be represented by its standard $m \times n$ matrix.

(In the case of $\mathbb{R}$, such functions take the form $M(x) = mx$, for some $m \in \mathbb{R}$.)

Basically, in summary, rather than thinking of the derivative as a number, we think of it as a linear function instead.

So what does the chain rule say? Suppose we have the composition $g \circ f : X \rightarrow Z$ of two differentiable functions $f : X \rightarrow Y$ and $g : Y \rightarrow Z$, and consider a point $x_0 \in X$. Let $y_0 = f(x_0) \in Y$. From $f$ at $x_0$, we get a linear map $M : X \rightarrow Y$ (again, think a matrix). From $g$ at $y_0$, we get a linear map $N : Y \rightarrow Z$. The chain rule states that $g \circ f$ is differentiable at $x_0$ with linear map $N \circ M$ (or when $N$ and $M$ are matrices, we product them: $NM$).

In short, the derivative of the composition of two functions is the composition of the derivatives. That's the chain rule, in a nutshell. Neat, huh?