Abuse of notation in the chain rule

436 Views Asked by At

I have a function: $f: \mathbb{R}^p \to \mathbb{R}^n$. Now let's define the functions $x_i : \mathbb{R}^p \to \mathbb{R}$, and hence we can define the function $\phi : (u_1,..., u_p) \to (x_1(u_1,...,u_p), ..., x_p(u_1,...,u_p))$

Then my book is defining the partial derivative of $f \circ \phi$ at $u_j$ as

$$\frac{\partial f\circ \phi}{\partial u_j} = \sum_{i = 1}^p \frac{\partial x_i}{\partial u_j} \frac{\partial f}{\partial x_i}$$

But it doesn't mean anything to take the partial derivative at a function!? So $\frac{\partial f}{\partial x_i}$ doesn't make sense, since $x_i$ is a function; I mean we can't calculate the partial derivative at a function. For example it doesn't mean anything to say $\frac{\partial (x^2+y^2)}{\partial xy}$, right?

So I guess this is an abuse of notation and that the right formula is

$$\frac{\partial f\circ \phi}{\partial u_j} = \sum_{i = 1}^p \frac{\partial x_i}{\partial u_j} \frac{\partial f}{\partial a_i}$$

where the $a_i$ are independent variables and not functions!

Am I correct?

Thank you!

2

There are 2 best solutions below

1
On BEST ANSWER

Yes, it is just a mild abuse of notation. Consider the 1-dimensional case. We could say that $f(x)$ is a differentiable function and $x(t)$ is differentiable. Then $(f\circ x)'(t)=f'(x(t))x'(t)$.

Ideally it is better to use a different letter for functions and say that if $f(x)$ is differentiable and $u(t)$ is differentiable then $(f\circ u)'(t)=f'(u(t))u'(t)$. But it is no big deal.

1
On

Too long for a comment:

We could also use the notation $D_i f$ for the $i$th partial derivative of $f$. Then this chain rule formula could be written as $$ D_j F(u) = \sum_{i=1}^p D_i f(x(u)) D_j x_i(u) $$ where $u = (u_1,\ldots,u_p)$ and $x(u) = (x_1(u),\ldots, x_p(u))$ and $F(u) = f(x(u))$.