Chain rule applied to two vector functions

Question

Chain rule applied to two vector functions

35 Views Asked by Bumbble Comm At 06 Apr 2026 - 7:28

I am trying to get comfortable with gradients on vector functions. I constructucted the following example. Could you please check if my reasoning is correct? I know it is quite basic, but I have no one else to ask.

Let

$f:\mathbb{R}^n \rightarrow \mathbb{R}^m$

$g:\mathbb{R}^m \rightarrow \mathbb{R}^k$

$z=f(g(x))$

$g(x)=y$

I want to take the gradient of $z$ w.r.t $x$ and this is how I formulated this:

\begin{align} \nabla_xz = \frac{\partial \mathbf{z}}{\partial\mathbf{x}} = \nabla_yz \cdot \nabla_xy \end{align}

where

\begin{align} \nabla_yz = \mathbf{J}_1 \in\mathbb{R}^{m\times n} \text{ and } \nabla_xy = \mathbf{J}_2 \in\mathbb{R}^{m\times k} \end{align}

with $\mathbf{J}_1$ and $\mathbf{J}_1$ being the Jacobians of of the respective gradients. Then I could reformulate $\nabla_xz$ as

\begin{align} \nabla_xz = {\mathbf{J}_1}^T \mathbf{J}_2 \in \mathbb{R}^{n\times k}. \end{align}

Is this formulation correct? If no, please guide me as to where I went wrong.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-02-09 04:36:57

Your definitions of $f$ and $g$ lead to an invalid composition.

You require the image of $g$ to be the preimage of $f$ for $f(g(x))$ to be sensible.

Define $g:\Bbb R^k\mapsto \Bbb R^m$ and $f:\Bbb R^m\mapsto\Bbb R^n$ so you will have $f\circ g:\Bbb R^k\mapsto\Bbb R^n$

So you will have $\vec x\in\Bbb R^k, \vec y=\vec g(\vec x)\in \Bbb R^m, \vec z=\vec f(\vec y)\in\Bbb R^n$ .

The dimension of a matrix is read rows$\times$columns. The rows of the Jacobian have the dimension of the "numerator" vector, with the columns having the dimension of the "denominator" vector.

For example: $\dfrac{\partial\vec y}{\partial\vec x}=\begin{bmatrix}\dfrac{\partial y_1}{\partial x_1}&\cdots&\dfrac{\partial y_1}{\partial x_k}\\\vdots&\ddots&\vdots\\\dfrac{\partial y_m}{\partial x_1}&\cdots&\dfrac{\partial y_m}{\partial x_k}\end{bmatrix}\in\Bbb R^{m\times k}$

Thus we have:

$$\nabla_{\vec x}\vec y\in\Bbb R^{m\times k}\\\nabla_\vec y\vec z\in\Bbb R^{n\times m}\\\nabla_\vec x\vec z \in\Bbb R^{n\times k}$$

When multiplying matrices $\mathrm A\in\Bbb R^{n\times m}$ and $\mathrm B\in\Bbb R^{m\times k}$ the result is $\mathrm {AB}\in\Bbb R^{n\times k}$. So we require the order of multiplication to be: $$\begin{align}\nabla_\vec x \vec z &=(\nabla_\vec y \vec z)~(\nabla_\vec x\vec y)\\[2ex]& =\sum_{i=1}^m\begin{bmatrix}\dfrac{\partial z_1}{\partial y_i}\dfrac{\partial y_i}{\partial x_1}&\cdots&\dfrac{\partial z_1}{\partial y_i}\dfrac{\partial y_i}{\partial x_k}\\\vdots&\ddots&\vdots\\\dfrac{\partial z_n}{\partial y_i}\dfrac{\partial y_i}{\partial x_1}&\cdots&\dfrac{\partial z_n}{\partial y_i}\dfrac{\partial y_i}{\partial x_k}\end{bmatrix}\end{align}$$

That is all.

$\blacksquare$

Chain rule applied to two vector functions

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in VECTORS

Related Questions in CHAIN-RULE

Trending Questions

Popular # Hahtags

Popular Questions