What justifies writing the chain rule as $\frac{d}{dx}=\frac{dy}{dx}\frac{d}{dy}$ when there is no function for it to operate on?

249 Views Asked by At

This previous question of mine has lead me to ask the following question:

It was my understanding that the chain rule $$\dfrac{du}{dx}=\dfrac{dy}{dx}\dfrac{du}{dy}$$ only makes sense when there is some function $u$ for it to operate on.

So how can we possibly justify writing $$\dfrac{d}{dx}=\dfrac{dy}{dx}\dfrac{d}{dy}?$$

One of the answers to the previous question mentioned that if

$$\frac{du}{dx}=\frac{du}{dy}\sqrt{\frac{m\,\omega_0}{\hbar}}$$ then just looking at the operator $\frac{d}{dx}$ without explicitly addressing the function $u$ the operator is to apply: \begin{align*} \frac{d}{dx}&=\sqrt{\frac{m\,\omega_0}{\hbar}}\frac{d}{dy} \end{align*}

In one of the comments given to the other answer to the previous question states that

I find it easier to look at what is $\dfrac{du}{dx}$ is in terms of $y$ this means working out the derivative of $u(y(x))$ to do this you use the chain rule - so we have $$\frac{du}{dx} =\frac{dy}{dx}\frac{du}{dy}$$ so looking at the operator part I.e "ignore" the $u$. This is how my physicist brain computes changes of variables.

But I am finding it hard to accept that we simply "ignore" the $u$. I realize that we cannot simply cancel out the $u$ since we are not guaranteed that $u\ne 0$.

Is there a more plausible explanation as to why we may write $$\dfrac{d}{dx}=\dfrac{dy}{dx}\dfrac{d}{dy}$$

in the absence of a function to operate upon?

Regards.

2

There are 2 best solutions below

4
On BEST ANSWER

Here are some aspects regarding the shift from \begin{align*} \dfrac{du}{dx}=\dfrac{dy}{dx}\cdot\dfrac{du}{dy}\qquad\longrightarrow\qquad \dfrac{d}{dx}= \dfrac{dy}{dx}\cdot\dfrac{d}{dy}\tag{1} \end{align*}

We do not forget anything, but instead we change our point of view to a more abstract one.

First step: Functions of points

At first we take a look at a simpler example, one step below the abstraction layer in (1). We consider real-valued functions $f,g:\mathbb{R}\rightarrow \mathbb{R}$ and scalar multiplication. As we know \begin{align*} c\left(f(x)+g(x)\right)=cf(x)+cg(x)\qquad \forall x\in \mathbb{R}\tag{2} \end{align*}

We can express this relationship more abstract by writing \begin{align*} c(f+g)=cf+cg\tag{3} \end{align*}

We know functions transform real values to real values. But instead of applying functions to specific values as we did in (1) we now begin to treat them in (2) as objects by their own.

In (1) we consider scalar multiplication and addition of real values $c(f(x)+g(x))$. In (2) we consider scalar multiplication and addition of functions.

Of course, we don't forget that functions are objects which are applied to reals. But we shift our view to a more abstract one. This has a tremendous benefit. From now on we can ask where do functions live and how do they interact. In the same way as we did former ask, where do reals live and how do they interact.

We can consider the set of real-valued functions and study the relationship between elements of this set as we formerly did when we considered the set of reals and studied the relationship between them.

In fact this is the first step in a direction where functions become points in a function space and where we study the relationship of the points within such spaces. This is the main theme of functional analysis.

Second step: Functions of functions

We now consider real-valued differentiable functions $f,g$ in one variable. We look at the chain rule of differentiation \begin{align*} \frac{d}{dx}\left(f\left(g(x)\right)\right)=\frac{d}{dx} g(x)\cdot\frac{d}{dg}f(g)\tag{4} \end{align*} We can express this relationship more abstract by writing \begin{align*} \frac{d}{dx}=\frac{dg}{dx}\cdot\frac{d}{dg}\tag{5} \end{align*}

Note the step from (4) to (5) is analogously to the step from (2) to (3). In (4) we see a differential operator transforms a function to a function. In (5) we take a more abstract point of view and consider multiplication of differential operators. Here we are going towards operator calculus.

Notes:

  • Regarding some comments and a seemingly abuse of notation when using $\frac{dy}{dx}$.

    Keep in mind that this notation is also extremely powerful. It indicates interesting relationships only by its pure power of symbols which other notational conventions can't do.

    It can be made mathematically rigorous as it is shown in this answer.

  • See this paper for some information about how and when it's convenient to work with operators.

2
On

If two operators are equal, it just means that if they operate on the same function, they generate the same result. So $$\dfrac{d}{dx}=\dfrac{dy}{dx}\dfrac{d}{dy}$$ just means:

For all functions $u$ for which both sides are defined, $\dfrac{du}{dx}=\dfrac{dy}{dx}\dfrac{du}{dy}$.