Differentiating a function composition

69 Views Asked by Bumbble Comm At 24 Apr 2026 - 8:49

Given $g:R^n \rightarrow R^k$ and $h:R^k \rightarrow R$, we have $f(x) = h(g(x))$.

Using the chain rule, we can differentiate $f(x)$ to get

$f'(x) = \nabla^Th(g(x))g'(x)$

My question is why do we take the transpose of the gradient of $h$? Is it just to make sure the result is a scalar, since $f(x)$ is in $R$?

If so, does it mean that every time we do vector differentiation, we need to ensure the output matches the size of the result, and take transpose if necessary (i.e. no hard and fast rule of taking transpose)?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 05 Feb 2016 - 4:41

First, the result isn't an scalar, is a (row) vector. Second, lousy notation. The usual formula for the chain rule is $$D(h\circ g)(x) = Dh(g(x))Dg(x)$$ where the product in the RHS is the matrix product. In your case $\nabla^T h(g(x))$ (i.e., $Dh(g(x))$) is a row vector. See https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant.

EDIT: example with $n = 3$, $k = 2$: $$ \pmatrix{\partial_1 f&\partial_3 f&\partial_3 f} = \pmatrix{\partial_1 h&\partial_2 h} \pmatrix{\partial_1 g_1&\partial_2 g_1&\partial_3 g_1\cr\partial_1 g_2&\partial_2 g_2&\partial_3 g_2}. $$

Differentiating a function composition

There are 1 best solutions below

Related Questions in FUNCTIONAL-ANALYSIS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in VECTOR-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions