Confusion about multivariable chain rule

101 Views Asked by At

Let $F: U \subset\mathbb{R^n} \to \mathbb{R^m}, G: V \subset \mathbb{R^m} \to \mathbb{R^p} $ be differentiable functions on their domain.

It is well known that the chain rule says:

$$D(G \circ F)(a) = DG(F(a)) \circ DF(a)$$

However, to calculate this, we can do:

$$ DG(F(a)) \circ DF(a) = JG(F(a)) . JF(a)$$

where $JG(F(a)), JF(a)$ denote the jacobian matrices of $DG$ respectively $DF$ evaluated in $F(a)$ respectively $a$.

Can someone explain me why the last equality is true?

1

There are 1 best solutions below

0
On BEST ANSWER

The basic intuition is that derivatives are linear approximations for a function. For vector valued functions, these correspond to linear transformations. In particular, we can represent the derivative with matrices. It turns out that in a particularly nice basis (standard basis for $\mathbb R^n$ and $\mathbb R^m$ respectively) the best linear approximation is exactly the jacobian.

However, the key insight now, is that function composition for linear maps is exactly matrix multiplication.

This can be seen in a fairly down to earth way.

For two functions $f,g : \mathbb R \to \mathbb R$, the derivatives $f^{\prime}, g^{\prime}$ are linear transformations, but this is just scalar multiplication. But to really imagine these as genuinely linear at each $(x,f(x))$, you have to imagine that they are linear in the sense that the point $(x,f(x))$ is the origin of a linear map. So, $(f \circ g)^{\prime}:=f^{\prime}(g(x)) \cdot g^{\prime}(x)$ which is equivalent to saying "scalar multiplication."

For $f,g: \mathbb R^2 \to \mathbb R^2$, we do a similar thing, but now the linear maps are not scalar multiplication, but instead linear maps, which can be given by a $2 \times 2$ matrix, which is exactly the jacobian. In a similar manner, if you go through with the algebra, you will see that the composition of two matrix functions is exactly multiplication.