Nature of terms in chain rule of differentiation?

24 Views Asked by At

Assume I have a function $f(g(x))$ which I wish to derive after $x$ (so I'm looking for $\frac{d}{dx}f(g(x))$). Assume further that the functions have the following structure:

$$f:\mathbb{R}^d\rightarrow\mathbb{R}$$ $$g:\mathbb{R}\rightarrow\mathbb{R}^d$$

where $d$ is some integer $d>1$. With this structure, we expect that $\frac{d}{dx}f(g(x))$ would return a scalar, as $f(g(x))$ returns a scalar, and $x \in \mathbb{R}$ (and thus $\frac{d}{dx}$) is also a scalar. Now the chain rule of differentiation states that

$$f(g(x))'=f'(g(x))g'(x)$$

Here is my question: Considering the setup above, the first term of chain rule ($f'(g(x))$) can obviously not return a scalar, because $\frac{d}{dx}g(x)$ must be a vector ($\frac{d}{dx}\in\mathbb{R}$ is a scalar, $g(x)\in\mathbb{R}^d$ is a vector) and the product between the two terms must result in a scalar. As a consequence, $f'(g(x))$ should be a vector as well, so we can form an inner product. This, in turn, means that the first term is not derived after $x$, because it would only return a scalar ($\frac{d}{dx}\in\mathbb{R}$ is a scalar, $f(g(x))\in\mathbb{R}$ is a scalar).

I suppose the first term must be derived after $g(x)$ instead, otherwise it would not yield a vector. So I suppose the first term is: $f'(g(x))=\frac{d}{dg(x)}f(g(x))$. Is this correct? If so, how can I intuit this?

2

There are 2 best solutions below

1
On BEST ANSWER

Yes, of course, that's what the prime in $f'(g(x))$ indicates -- the derivative of $f$ is here taken with respect to $g,$ not $x.$ Perhaps this is another case where the Leibniz notation is less ambiguous than the other.

Well, what do you mean by intuit? You want to differentiate a real quantity of several variables? Well, how we think of this is actually in terms of differentials. Although the ratio $\mathrm df/\mathrm dg$ would stand for a vector, usually called the gradient vector, of $f$ with respect to $g.$ The total differential of $f$ tells us the infinitesimal change in $f$ when the vector $g$ is changed inifinitesimally -- that is, when all the components are given infinitesimal changes.

0
On

Note that if $F:\mathbb R^n \to \mathbb R^m$ is differentiable at $x$ then $F'(x)$ is an $m \times n$ matrix.

So, in this example, $f'(g(x))$ is a $1 \times d$ matrix (row vector), and $g'(x)$ is a $d \times 1$ matrix (column vector). So the product $$ \underbrace{f'(g(x))}_{1\times d} \underbrace{g'(x)}_{d \times 1} $$ is defined and yields a scalar.