Trying to understand vector Jacobian product with higher order derivatives

150 Views Asked by At

I am trying to understand in mathematical terms how derivatives are computed using automatic differentiation tools like PyTorch. I am focusing here. I started with a simple example when $f\colon \mathbb{R}^N \to \mathbb{R}^N$

\begin{align*} f\colon \mathbb{R}^N & \longrightarrow \mathbb{R}^N\\ x&\longmapsto [x_1^2, \dots, x_N^2]^T, \end{align*} Then, the Jacobian of $f$ is

$$J = \begin{bmatrix} \dfrac{\partial f_1}{\partial x_1} & \cdots & \dfrac{\partial f_1}{\partial x_N}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial f_N}{\partial x_1} & \cdots & \dfrac{\partial f_N}{\partial x_N} \end{bmatrix} = 2\begin{bmatrix} x_1 & 0 & 0 & \cdots & 0 \\ 0 & x_2 & 0 & \cdots & 0 \\ 0 & 0 & x_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & x_N \end{bmatrix}_{N\times N}$$ and the vector Jacobian product is $v^TJ = 2[x_1, x_2, \dots, x_n]$, where $v = [1, 1, \dots, 1]^T$. Now I am trying to understand how $v^TJ$ is computed when

\begin{align*} f\colon \mathbb{R}^{N\times N} & \longrightarrow \mathbb{R}^{N\times N}\\ \begin{bmatrix} x_{11} & x_{12} & x_{13} & \cdots & x_{1N} \\ x_{21} & x_{22} & x_{23} & \cdots & x_{2N} \\ x_{31} & x_{32} & x_{33} & \cdots & x_{3N} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{N1} & x_{N2} & 0 & \cdots & x_{NN} \end{bmatrix}&\longmapsto \begin{bmatrix} x_{11}^2 & x_{12}^2 & x_{13}^2 & \cdots & x_{1N}^2 \\ x_{21}^2 & x_{22}^2 & x_{23}^2 & \cdots & x_{2N}^2 \\ x_{31}^2 & x_{32}^2 & x_{33}^2 & \cdots & x_{3N}^2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{N1}^2 & x_{N2}^2 & x_{N3}^2 & \cdots & x_{NN}^2 \end{bmatrix}. \end{align*} An practical example is given here. Could you please some one explain in mathematical terms how $v^TJ$ is computed in this case where $v$ is a ``vector'' of all ones?

1

There are 1 best solutions below

0
On

In the language of pytorch you can flatten your input and output tensor and thinking about a map from $f\colon \mathbb{R}^{N^2} \to\mathbb{R}^{N^2}$.

Then your Jacobian is still a giant matrix of $\mathbb{R}^{N^2\times N^2}$. Just like your vector example, the entries of the matrix have no interaction, so the Jacobian is a diagonal matrix, just like your example above.

Then given a vector $v=[v_{11},v_{12},...v_{NN-1},v_{NN}]$ of $N^2$ entries, multiplying it by the diagonal Jacobian is just element-wise multiplication. $$ v^TJ = 2[x_{11}v_{11},x_{12}v_{12},...x_{NN-1}v_{NN-1},x_{NN}v_{NN}] $$ Then you can easily reshape this vector jacobian product into $N,N$ matrix.