Using the chain rule for gradients with different function mappings

Question

Using the chain rule for gradients with different function mappings

3.8k Views Asked by Bumbble Comm At 04 May 2026 - 8:01

Consider $x \in \mathbf R$, $\theta(x)$ is defined as $\theta : \mathbf R \to \mathbf R^n$, and $f(\theta)$ defined as $f : \mathbf R^n \to \mathbf R$. That is, $f$ is a function of $\theta$, and $\theta$ is a function of $x$.

You may assume that $\theta$ is differentiable in $x$, and $f$ differentiable in $\theta$.

I am trying to evaluate $\nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$\nabla_x f = (\nabla_{\theta} f)^T \nabla_x \theta.$$ Is this valid?

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2018-07-16 19:13:54

$\newcommand{\bbR}{\mathbb{R}}$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $\theta$. Note that $\theta(x)$ is a single-variable function so $\nabla_x\theta$ doesn't make sense either.

Define a new function $g \colon \bbR \to \bbR$ given by $g(x) = f(\theta(x))$. Then by the chain rule, $$g'(x_0) = \left.\nabla_\theta(f)\right|_{\theta(x_0)} ^\top \theta'(x_0).$$ Spelled out completely, $$g'(x_0) = \left.\frac{\partial f}{\partial \theta_1}\right|_{\theta_1(x_0)} \left.\frac{d\theta_1}{dx}\right|_{x_0} + \cdots + \left.\frac{\partial f}{\partial \theta_n} \right|_{\theta(x)}\left.\frac{d\theta_n}{dx}\right|_{x_0}$$

**Bumbble Comm** · Answer 2 · 2018-07-16 19:29:06

I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $\nabla$ should be used for gradients, not ordinary derivatives).

Let's be a bit more careful. Define $g:\mathbb{R}\rightarrow\mathbb{R}$ by $$ g(x)\equiv f(\theta(x))\equiv f(\theta_{1}(x),\ldots,\theta_{n}(x)). $$ What you are looking for is $g^{\prime}$, the derivative of $g$. Apply the chain rule to get $$ g^{\prime}(x)=\theta_{1}^{\prime}(x)f_{\theta_{1}}(\theta(x))+\cdots+\theta_{n}^{\prime}(x)f_{\theta_{n}}(\theta(x)). $$ Or, more succinctly, $$ g^{\prime}(x)=\left[\nabla_{\theta}f(\theta(x))\right]^{\intercal}\theta^{\prime}(x). $$ Omitting the arguments, this looks like your expression $(\nabla_{\theta}f)^{\intercal}\theta^{\prime}$.

**Bumbble Comm** · Answer 3 · 2018-07-16 19:44:27

Correction:$$\dfrac{df}{dx}= (\nabla_{\theta} f)^T \nabla_x \theta$$we have:$$df=\dfrac{\partial f}{\partial \theta_1}d\theta_1+\cdots+\dfrac{\partial f}{\partial \theta_n}d\theta_n$$or$${df\over dx}=\dfrac{\partial f}{\partial \theta_1}{d\theta_1\over dx}+\cdots+\dfrac{\partial f}{\partial \theta_n}{d\theta_n\over dx}$$from the other side$$\nabla\theta=\left[{d\theta_i\over dx}\quad\cdots\quad{d\theta_n\over dx}\right]$$for which the same relation we wanted to prove turns out immediately.

Using the chain rule for gradients with different function mappings

There are 3 best solutions below

Related Questions in CALCULUS

Related Questions in MULTIVARIABLE-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions