Question about multivariate chain rule with more than two nested functions

564 Views Asked by At

For the multivariate chain rule as explained here:

Given $z$ is a function of $x_1,\dots,x_n$ and each $x$'s is a function of $t_1,\dots,t_m$:

$\frac{\partial z}{\partial t_i} = \frac{\partial z}{\partial x_1}\frac{\partial x_1}{\partial t_i}+ \dots +\frac{\partial z}{\partial x_n}\frac{\partial x_n}{\partial t_i}$

I'm wondering if the above can be extended. If $z$ is a function of $x_1,\dots,x_N$, and each $x$'s is a function of $t_1,\dots,t_M$, and each $t$ is a function of $u_1, \dots, u_L$, and each $u$ is a function of $v_1,\dots,v_K$:

$$\frac{\partial z}{\partial v_i} = \sum_{a=1}^{N}\sum_{b=1}^{M}\sum_{c=1}^{L}\frac{\partial z}{\partial x_a}\frac{\partial x_a}{\partial t_b}\frac{\partial t_b}{\partial u_c}\frac{\partial u_c}{\partial v_i}$$

and possibly even extended further if $v$ is a function of another set of variables and I wanted the partial derivative of $z$ with respect to a variable in that set. Is this valid? I couldn't find any reference that explicitly states how to take partial derivatives of multiple nested function.

1

There are 1 best solutions below

1
On

I think that the multivariable chain rule is easier to understand if you work with the Jacobian matrices of the functions involved: If $U\subseteq\mathbb R^n$ and $f:U\to\mathbb R^m$ is differentiable at $\mathbf p\in U$, then the Jacobian matrix $J_f(\mathbf p)$ of $f$ at $\mathbf p$ is just the $m\times n$ matrix of partial derivatives of $f$ at $\mathbf p$. Treating the coordinates $y_i$ of $f(x_1,\dots,x_m)$ as individual scalar functions of the $x_j$, another notation that you might see for this matrix is $${\partial(y_1,\dots,y_n)\over\partial(x_1,\dots,x_m)} = \begin{bmatrix}{\partial y_1\over\partial x_1}&\cdots&{\partial y_1\over\partial x_m}\\\vdots&\ddots&\vdots\\{\partial y_n\over\partial x_1}&\cdots&{\partial y_n\over\partial x_m}\end{bmatrix}.$$

In terms of Jacobians, the chain rule says that if $f$ is differentiable at $\mathbf p$ and $g$ is differentiable at $f(\mathbf p)$, then $$J_{g\circ f}(\mathbf p) = J_g(f(\mathbf p))J_f(\mathbf p).$$ That is, the Jacobian of a composition of functions is the product of their Jacobians.

Applying this to your question and suppressing the points at which each of the Jacobians is evaluated, we have $$\begin{bmatrix}{\partial z\over\partial v_1}&\cdots&{\partial z\over\partial v_K}\end{bmatrix} = \begin{bmatrix}{\partial z\over\partial x_1}&\cdots&{\partial z\over\partial x_N}\end{bmatrix} {\partial(x_1,\dots,x_N)\over\partial(t_1,\dots,t_M)} {\partial(t_1,\dots,t_M)\over\partial(u_1,\dots,u_L)} {\partial(u_1,\dots,u_L)\over\partial(v_1,\dots,v_k)}.$$ Therefore by the rules of matrix multiplication, $${\partial z\over\partial v_i} = {\partial z\over\partial x_j}{\partial x_j\over\partial t_k}{\partial t_k\over\partial u_l}{\partial u_l\over\partial v_i}$$ (with an implicit summation over any repeated indices), just as you suspected. Notice how, using the fraction notation, the numerators and denominators of the Jacobians formally “cancel,” just as with Leibniz’s notation for derivatives.