Derivative for chain of different multivariate functions

45 Views Asked by At

I have a chain of two multivariate functions $f(g(\pmb{y}))$ with $f:\mathbb{R}^n \to \mathbb{R}$, $g: \mathbb{R}^m \to \mathbb{R}^n$ with $m > n$. Now I am looking for the derivative of this function w.r.t $\pmb{y} \in \mathbb{R}^m$. The chain rule states: $$ \frac{\partial f(g(\pmb{y}))}{\partial \pmb{y}} = \frac{\partial f(g(\pmb{y}))}{\partial g(\pmb{y})} \frac{\partial g(\pmb{y})}{\partial \pmb{y}} $$ However this means now that the first derivative is the gradient: $$ \frac{\partial f(g(\pmb{y}))}{\partial g(\pmb{y})} = \nabla f \in \mathbb{R}^n $$ and the second derivative is the Jacobi Matrix $$ \frac{\partial g(\pmb{y})}{\partial \pmb{y}} = J_g \in \mathbb{R}^{n \times m} $$ This results in $$\frac{\partial f}{\partial \pmb{y}} = \nabla f \cdot J_g$$

Assuming that the chain rule is applicable in the way presented here, how can the result now be calculated?

For the matrix multiplication to be applicable this would require something like $\nabla f^T J_g$ ,as I am expecting the resulting derivative to be an element in $\mathbb{R}^m$. However I am doubtful this is a correct/allowed operation here.

I welcome any kind of feedback and input. Thank you.

1

There are 1 best solutions below

1
On BEST ANSWER

Your interpretation is correct. I always found that the gradient is misleading… and your question is just about that.

The derivative of a map $f : \mathbb R^n \to \mathbb R$ at a point $x$ is a linear form $L: \mathbb R^n \to \mathbb R$. But maybe to avoid frightening students, one wants to avoid speaking of linear form and replace it with the unique vector $\nabla f(x)$ such that $L(y)=\nabla f(x)^T \cdot y$ for all $y \in \mathbb R^n$. This vector is the gradient.

This exactly leads to your interpretation for the chain rule.