How to show Jacobian of a composite function is the product of Jacobians?

Question

How to show Jacobian of a composite function is the product of Jacobians?

3.9k Views Asked by user494522 At 25 Mar 2026 - 8:20

Let $f: \mathbb{R}^m \rightarrow \mathbb{R^n}$ and $g : \mathbb{R^n} \rightarrow \mathbb{R^m}$ be two vector-valued functions. We want to show that

$$J_{f\circ g}(a)=J_f(g(a))Jg(a)$$

where $J$ is the Jacobian and $a$ is a point in $\mathbb{R^n}$.

Wikipedia Chain Rule page, at Higher dimensions section, has the following:

Let $D_a(g)$ denote the total derivative of $g$ at $a$ and $D_{g(a)}(f)$ denote the total derivative of $f$ at $g(a)$. These two derivatives are linear transformations. The chain rule for total derivatives says that their composite is the total derivative of $f \circ g$ at $a$, that is:

$$ D_a(f \circ g) = D_{g(a)}(f) \circ D_a(g) \tag{1} $$

Concluding by the fact that the total derivative is Jacobian and since it is linear transformation, the composite of two total derivative becomes product of them.

My questions:

1- Why $(1)$ is true? prove this?

2- Alternative prove that start from scratch and shows $J_{f\circ g}(a)=J_f(g(a))Jg(a)$.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 13 Jun 2019 - 8:17

The quickest and cleanest proof I've seen is in Loomis and Sternberg's book Advanced Calculus (Chapter 3 Theorem 6.2). To prepare for the proof, they begin in section 3.5, titled "Infinitesimals", where they collect a bunch of rules about properties of $\mathcal{o}$ and $\mathcal{O}$ functions (little oh and big OH) defined on normed vector spaces (this is where the real work goes in, using $\epsilon$'s and $\delta$'s, but they're all extremely easy to prove). This is done in Theorem $5.1$. Once they establish the "working rules" for manipulating little oh and big oh functions, the actual proof of the chain rule is only like $6$ lines.

I know you didn't ask for a reference request, but I think the proof provided there is so elegant (because all the necessary prep work has been done systematically, unlike most proofs of the chain rule, like in Spivak /Munkres where they seem to pull estimates randomly, that it hides the essence of the proof) that it is better for you to read it rather than me try to explain it hurriedly.

By the way, my remarks about the "quickness/cleanness" of their proof does not in any way imply lack of rigor. On the contrary; this book is extremely rigorous in the earlier chapters.

Bumbble Comm On 13 Jun 2019 - 8:34

Proof using alternative 2: Since the Jacobians of all three functions exist we only need to show that they are equal

Let $A_{(i)}$ denote the i-th row of any matrix A and $A^{(j)}$ its j-th column.

I will show that $J_{f \circ g}(a)_{ij} = J_f(g(a))_{(i)}J_g(a)^{(j)}$. That is $J_{f \circ g}(a) = J_f(g(a))J_g(a)$.

$J_{f \circ g}(a)_{ij} = \frac{d(f\circ g)_i(a)}{da_j}$ by definition.

We have $J_{f \circ g}(a) = \begin{bmatrix} \nabla (f\circ g)_1(a)^T \\ \nabla (f\circ g)_2(a)^T \\ \vdots \\ \nabla (f\circ g)_n(a)^T \end{bmatrix}$,

where $ \nabla (f\circ g)_i(a)^T = \begin{bmatrix} \frac{(f\circ g)_i(a)}{da_1} & \frac{(f\circ g)_i(a)}{da_2} & \dots & \frac{(f\circ g)_i(a)}{da_n} \end{bmatrix}$=$\begin{bmatrix} \frac{f_i(g(a))}{da_1} & \frac{f_i(g(a))}{da_2} & \dots & \frac{f_i(g(a))}{da_n} \end{bmatrix} (1)$

$J_f(x) = \begin{bmatrix} \nabla f_1(x)^T \\ \nabla f_2(x)^T \\ \vdots \\ \nabla f_n(x)^T \end{bmatrix}$,

where $J_f(x)_{(i)} = \nabla f_i(x)^T = \begin{bmatrix} \frac{f_i(x)}{dx_1} & \frac{f_i(x)}{dx_2} & \dots & \frac{f_i(x)}{dx_m} \end{bmatrix}(2)$

$J_g(y) = \begin{bmatrix} \nabla g_1(x)^T \\ \nabla g_2(x)^T \\ \vdots \\ \nabla g_m(x)^T \end{bmatrix}$,

where $J_g(y)^j = \begin{bmatrix} \frac{dg_1(a)}{da_j} \\ \frac{dg_2(a)}{da_j} \\ \vdots \\ \frac{dg_m(a)}{da_j} \end{bmatrix}$(3),

$f_i(x)$ and $g_j(y)$ are images $\mathbb{R}^m \rightarrow \mathbb{R}$ and $\mathbb{R}^n \rightarrow \mathbb{R}$ respectively. This implies that the total differential of $f_i(g(a))$ is:

$$ df_i(g(a)) \overset{(2)}= \sum_{k = 1}^m \frac{df_i(x)}{dx_k}\arrowvert_{x = g(a)}\frac{dg_k(a)}{da_j}da_j. (4)$$

$J_{f \circ g}(a)_{ij} = \frac{df\circ g_i(a)}{da_j} \overset{(1)}= \frac{df_i(g(a))}{da_j} \overset{(4)}= \sum_{k = 1}^m \frac{df_i(x)}{dx_k}\arrowvert_{x = g(a)}\frac{dg_k(a)}{da_j} \overset{(2),(3)}= J_f(g(a))_{(i)}J_g(a)^{(j)}$.

**Bumbble Comm** · Accepted Answer

One idea is as follows. If $f$ and $g$ are differentiable, then by definition it exists a best linear approximation which is:

$$ g(x+h)=g(x)+dg[x](h)+o(\|h\|) $$

where $dg[x](h)$ denotes the action of the linear application $dg[x]$ on the vector $h$.

The notation $o(\|h\|)$ means that $$\lim\limits_{h\to 0} \frac{o(\|h\|)}{\|h\|}=\mathbf{0}_{\mathbb{R}^m}$$ Its interpretation is that for any constant $C$, $o(\|h\|)$ tends faster than $C.h$ to the zero vector, hence for small $h$ our linear approximation $dg[x].h$ is not "perturbed" by this reminder.

We can write the same thing for $f$: $$ f(y+k)=f(y)+df[y].k+o(\|k\|) $$

Now observe that:

\begin{align} f(g(x+h)) &=f(\overbrace{g(x)}^y+\overbrace{dg[x].h+o(\|h\|)}^k)) \\ &=f(g(x))+df[g(x)](dg[x].h+o(\|h\|))+o(\|dg(x).h+o(\|h\|)\|) \end{align}

Now from the definition of $o(.)$, it is not too hard to see that: $$ o(\|dg(x).h+o(\|h\|)\|)=o(\|h\|) $$ and (by linearity) = \begin{align} df[g(x)](dg[x].h+o(\|h\|)) &=df[g(x)](dg[x](h))+df[g(x)](o(\|h\|)) \\ &=df[g(x)](dg[x](h))+o(\|h\|) \end{align}

We finally get $$ f(g(x+h))=f(g(x))+df[g(x)](dg[x](h))+o(\|h\|) $$ and by identification $$d(f\circ g)[x]=df[g(x)]dg[x]$$ which is the expected result

How to show Jacobian of a composite function is the product of Jacobians?

There are 3 best solutions below

Related Questions in DERIVATIVES

Related Questions in LINEAR-TRANSFORMATIONS

Related Questions in JACOBIAN

Trending Questions

Popular # Hahtags

Popular Questions