Using the chain rule for gradients with different function mappings

3.8k Views Asked by At

Consider $x \in \mathbf R$, $\theta(x)$ is defined as $\theta : \mathbf R \to \mathbf R^n$, and $f(\theta)$ defined as $f : \mathbf R^n \to \mathbf R$. That is, $f$ is a function of $\theta$, and $\theta$ is a function of $x$.

You may assume that $\theta$ is differentiable in $x$, and $f$ differentiable in $\theta$.

I am trying to evaluate $\nabla_x f$, but am worried that my intuition is incorrect. I am wondering if it is correct to say that, using the chain rule, $$\nabla_x f = (\nabla_{\theta} f)^T \nabla_x \theta.$$ Is this valid?

3

There are 3 best solutions below

0
On

$\newcommand{\bbR}{\mathbb{R}}$It doesn't really make sense to talk about differentiating $f$ in both $x$ and $\theta$. Note that $\theta(x)$ is a single-variable function so $\nabla_x\theta$ doesn't make sense either.

Define a new function $g \colon \bbR \to \bbR$ given by $g(x) = f(\theta(x))$. Then by the chain rule, $$g'(x_0) = \left.\nabla_\theta(f)\right|_{\theta(x_0)} ^\top \theta'(x_0).$$ Spelled out completely, $$g'(x_0) = \left.\frac{\partial f}{\partial \theta_1}\right|_{\theta_1(x_0)} \left.\frac{d\theta_1}{dx}\right|_{x_0} + \cdots + \left.\frac{\partial f}{\partial \theta_n} \right|_{\theta(x)}\left.\frac{d\theta_n}{dx}\right|_{x_0}$$

0
On

I believe you have the right idea, but your notation is a little bit confusing (e.g., while technically correct, $\nabla$ should be used for gradients, not ordinary derivatives).

Let's be a bit more careful. Define $g:\mathbb{R}\rightarrow\mathbb{R}$ by $$ g(x)\equiv f(\theta(x))\equiv f(\theta_{1}(x),\ldots,\theta_{n}(x)). $$ What you are looking for is $g^{\prime}$, the derivative of $g$. Apply the chain rule to get $$ g^{\prime}(x)=\theta_{1}^{\prime}(x)f_{\theta_{1}}(\theta(x))+\cdots+\theta_{n}^{\prime}(x)f_{\theta_{n}}(\theta(x)). $$ Or, more succinctly, $$ g^{\prime}(x)=\left[\nabla_{\theta}f(\theta(x))\right]^{\intercal}\theta^{\prime}(x). $$ Omitting the arguments, this looks like your expression $(\nabla_{\theta}f)^{\intercal}\theta^{\prime}$.

0
On

Correction:$$\dfrac{df}{dx}= (\nabla_{\theta} f)^T \nabla_x \theta$$we have:$$df=\dfrac{\partial f}{\partial \theta_1}d\theta_1+\cdots+\dfrac{\partial f}{\partial \theta_n}d\theta_n$$or$${df\over dx}=\dfrac{\partial f}{\partial \theta_1}{d\theta_1\over dx}+\cdots+\dfrac{\partial f}{\partial \theta_n}{d\theta_n\over dx}$$from the other side$$\nabla\theta=\left[{d\theta_i\over dx}\quad\cdots\quad{d\theta_n\over dx}\right]$$for which the same relation we wanted to prove turns out immediately.