How to explain this quirk of the chain rule?

803 Views Asked by At

Assume I have a function $f = f(y, \phi(y,x))$ and I want to calculate $\frac{\partial f}{\partial y}$, I use the chain rule to get

\begin{equation} \frac{\partial f}{\partial y} = \frac{\partial f}{\partial y} + \frac{\partial f}{\partial \phi}\frac{\partial \phi}{\partial y} \end{equation}

but obviously the $\frac{\partial f}{\partial y}$ represent different things on each side of equality. How do I explain this? I'm guessing it is a notational issue.


Edit: Just to give some context why this troubles me. Here $x_i$ refers to the ith component of the vector $\mathbf{x}$ in euclidean space. In an acoustic textbook the Lighthill stress tensor $T_{ij}$ is involved in the following identity:

\begin{equation} \frac{\partial}{\partial x_i} \frac{T_{ij}(\mathbf{y},t-|\mathbf{x}-\mathbf{y}|/c)}{|\mathbf{x}-\mathbf{y}|} = \frac{\frac{\partial T_{ij}}{\partial y_i}}{|\mathbf{x}-\mathbf{y}|} - \frac{\partial}{\partial y_i} \frac{T_{ij}(\mathbf{y},t-|\mathbf{x}-\mathbf{y}|/c)}{|\mathbf{x}-\mathbf{y}|} \end{equation}

This can only be resolved if the numerator in the term $\frac{\frac{\partial T_{ij}}{\partial y_i}}{|\mathbf{x}-\mathbf{y}|}$ is given a different interpretation...Just try showing this:

Let $t-|\mathbf{x}-\mathbf{y}|/c = \phi(t,\mathbf{x}, \mathbf{y})$

\begin{array}{lcl} \frac{\partial}{\partial x_i} \frac{T_{ij}(\mathbf{y},\phi)}{|\mathbf{x}-\mathbf{y}|} & = & \frac{1}{|\mathbf{x}-\mathbf{y}|} \frac{\partial}{\partial x_i}T_{ij}(\mathbf{y},\phi) + T_{ij}(\mathbf{y},\phi) \frac{\partial}{\partial x_i} \frac{1}{|\mathbf{x}-\mathbf{y}|} \\ & = & \frac{1}{|\mathbf{x}-\mathbf{y}|} (\frac{\partial T_{ij}}{\partial \phi}\frac{\partial \phi}{\partial x_i}) + T_{ij}(\mathbf{y},\phi) \frac{\partial}{\partial x_i} \frac{1}{|\mathbf{x}-\mathbf{y}|}\\ & = & -\frac{1}{|\mathbf{x}-\mathbf{y}|} (\frac{\partial T_{ij}}{\partial \phi}\frac{\partial \phi}{\partial y_i}) + T_{ij}(\mathbf{y},\phi) \frac{\partial}{\partial x_i} \frac{1}{|\mathbf{x}-\mathbf{y}|} \end{array}

\begin{array}{lcl} \frac{\partial}{\partial y_i} \frac{T_{ij}(\mathbf{y},\phi)}{|\mathbf{x}-\mathbf{y}|} & = & \frac{1}{|\mathbf{x}-\mathbf{y}|} \frac{\partial}{\partial y_i}T_{ij}(\mathbf{y},\phi) + T_{ij}(\mathbf{y},\phi) \frac{\partial}{\partial y_i} \frac{1}{|\mathbf{x}-\mathbf{y}|} \\ & = & \frac{1}{|\mathbf{x}-\mathbf{y}|} ( \frac{\partial}{\partial y_i}T_{ij} +\frac{\partial T_{ij}}{\partial \phi}\frac{\partial \phi}{\partial y_i}) - T_{ij}(\mathbf{y},\phi) \frac{\partial}{\partial x_i} \frac{1}{|\mathbf{x}-\mathbf{y}|} \end{array}

Adding up the last line from each expression gives the result.

4

There are 4 best solutions below

12
On BEST ANSWER

Let's use a different notation: for a function of two variables $f$, denote by $\partial_1f$ and $\partial_2f$ the first-order derivatives of $f$ with respect to the first and second variable respectively, namely: \begin{align*} \partial_1f(x,y)&=\lim_{h\to0}\frac{f(x+h,y)-f(x,y)}h,& \partial_2f(x,y)&=\lim_{h\to0}\frac{f(x,y)-f(x,y+h)}h. \end{align*} Now, from the chain rule, $$\frac{\mathrm{d}}{\mathrm{d}y}\Bigl(f\bigl(y,\phi(y,x)\bigr)\Bigr) =\partial_1f\bigl(y,\phi(y,x)\bigr)+\partial_1\phi(y,x)\partial_2f\bigl(x,\phi(y,x)\bigr),$$ where $$\frac{\mathrm{d}}{\mathrm{d}y}\Bigl(f\bigl(y,\phi(y,x)\bigr)\Bigr)=\lim_{h\to0}\frac{f\bigl(y+h,\phi(y+h,x)\bigr)-f\bigl(y,\phi(y,x)\bigr)}{h}.$$


In fact, I try to always be careful to what I'm writing and what I really mean. First, I'm careful to never say the function f(x), but the function $f$ (unless $f$ is a function with codomain a set of functions). At best, $f(x)$ is an expression that depends on $x$.

Then I use symbols like $\partial_1$, $\partial_2$, etc. for functions, and things like $\dfrac{\mathrm{d}}{\mathrm{d}x}$ or $\dfrac{\partial}{\partial x}$ for expressions (though, in fact, it's slightly more complicated).

(Hence, I hate when people say something like [something] is a function of $x$. Heck, what does it mean to be a function of $x$? you're a function or you're not, you can't be a function of $x$; at best, you're an expression that depends on $x$).


Then there's something I like to do: take a function $f$ of two variables and define the function $g$ by $$g(y,x)=f(x,y).$$

Then I like to ask this question: with your notation, what sense do you give to $$\frac{\partial g}{\partial x}?$$ or to any other variation on the theme: $$\frac{\partial g(x,y)}{\partial x},\ \frac{\partial g}{\partial x}(x,y),\ \ldots$$

12
On

It is indeed a notational ambiguity. Is is clearer to write it as $$\frac{\partial f(y,\phi(y,x))}{\partial y}(x,y)=\frac{\partial f(x,y)}{\partial x}(y,\phi(y,x))+\left[\frac{\partial f(x,y)}{\partial y}(y,\phi(y,x))\right]\left[\frac{\partial \phi(x,y)}{\partial x}(y,x)\right]$$ In the above, the function in the numerator is the function being differentiated, and the point following the fraction is the point at which the derivative is evaluated. In other words, in $$\frac{\partial [\text{stuff}]}{\partial x}(\text{point})$$ "stuff" is the expression which is being differentiated with respect to $x$, yielding a function. Into that function we plug the point "point" to get the answer. E.g., $$\frac{\partial (x+y^3)}{\partial y}(y,x)=3y^2+0x\Bigg|_{(y,x)}=3x^2$$ Note that $$\frac{\partial h(y,x)}{\partial y}(x,y)=\frac{\partial h(x,y)}{\partial x}(y,x)$$

0
On

It would seem you have encountered a typing error. The left f is merely the label and the right side reveals some structure, yet it carries the same label. So differentiating both sides eliminates the quantity sought. This makes no sense. What would make more sense is if the label on the left were given as g, then your computation provides an answer. The problem is that the given equation is self-referencing and thus it leads nowhere.

0
On

In my opinion, Leibniz notation for partial derivatives is terrible: I avoid using it whenever possible, except for a particular usage from differential geometry. (the ambiguity you cite in your question isn't the only problem with it!)

My favorite notation is a variation of $f'$ used for the derivative of a univariate function $f$: the functions $f_1$ and $f_2$ are the functions one would normally write as

$$ f_1(x,y) = \frac{\partial}{\partial x} f(x,y) $$ $$ f_2(x,y) = \frac{\partial}{\partial y} f(x,y) $$

so I would write

$$ \frac{\partial}{\partial y} f(y, \phi(y,x)) = f_1(y, \phi(y,x)) + f_2(y, \phi(y,x)) \phi_1(y,x) $$

Typically, I'm interested in both partials rather than just one partial, and I would use differentials instead of partial derivatives to organize the calculation of all of them at once

$$ \mathrm{d} f(y, \phi(y,x)) = f_1(y, \phi(y,x))\, \mathrm{d}y + f_2(y, \phi(y,x)) \,\mathrm{d}\phi(y,x) = \ldots $$

and when I'm really interested in one partial, I do the same thing, except work in the setting where I've set $\mathrm{d}x=0$. (assuming the partial I really mean to use is the one where $x$ is held constant)


In the differential geometry setting, in my opinion there is no ambiguity:

$$ \frac{\partial}{\partial x^i} f(x^i, g(y^i, x^i)) $$

has only one reasonable meaning: applying the tangent vector $\partial/\partial x^i$ to the scalar field $f(x^i, g(y^i, x^i))$ in the $i$-th coordinate direction. In my opinion, you wouldn't use that notation when you wanted the derivative of $f$ with respect to its first argument.

(although this example has two sets of independent variables: which makes me again dislike using Leibniz notation for it)