Chain rule confusion - where does the plus come from

153 Views Asked by At

Let $f: \mathbb R^n \to \mathbb R^m$, $g: \mathbb R^m \to \mathbb R^k$.

The chain rule states that if $h =g \circ f$ then $D_h(x) = D_g(f(x))D_f(x)$

My question is how can we apply this in the following scenario for example:

$h(t) = f(tx, ty)$, I want to find $D_h(t)$

According to several solved examples, the answer should be $D_h(t) = \frac{\partial f(tx. ty)}{\partial (tx)}\frac{\partial (tx)}{\partial t} + \frac{\partial f(tx,ty)}{\partial (ty)}\frac{\partial (ty)}{\partial t}$

But I don't really understand where the "$+$" comes from. It seems like the chain rule was applied twice, but I fail to see exactly how it was applied and how to explain the plus sign.

3

There are 3 best solutions below

1
On BEST ANSWER

Your function $h \colon \mathbb{R} \to \mathbb{R}$ is $h(t) = f(tx, ty) = (f \circ g)(t)$, we set $g(t) = (tx, ty)$. The Chain Rule is $$Dh(t) = Df(g(t))\, Dg(t) = Df(tx, ty) \,Dg(t).$$ Notice that $g \colon \mathbb{R} \to \mathbb{R}^2$, so its derivative $Dg(t)$ is a $2 \times 1$ matrix: $$Dg(t) = \begin{pmatrix} \frac{\partial}{\partial t}(tx) \\ \frac{\partial}{\partial t}(ty) \end{pmatrix}.$$ I'll let $(u,v)$ denote the independent variables for $f$ if that's okay. Notice that $f \colon \mathbb{R}^2 \to \mathbb{R}$, so its derivative $Df(u,v)$ is a $2 \times 1$ matrix: $$Df(x,y) = \begin{pmatrix} \frac{\partial f}{\partial u}(x,y) & \frac{\partial f}{\partial v}(x,y) \end{pmatrix}.$$ Therefore, the Chain Rule gives: \begin{align*} Dh(t) = Df(tx, ty)\, Dg(t) & = \begin{pmatrix} \frac{\partial f}{\partial u}(tx,ty) & \frac{\partial f}{\partial v}(tx,ty) \end{pmatrix}\begin{pmatrix} \frac{\partial}{\partial t}(tx) \\ \frac{\partial}{\partial t}(ty) \end{pmatrix} \\ & = \frac{\partial f}{\partial u}(tx, ty) \frac{\partial (tx)}{\partial t} + \frac{\partial f}{\partial v}(tx, ty) \frac{\partial (tx)}{\partial y}. \end{align*}

0
On

Imagine the graph of $z=f(x,y)$ as a surface and fix a point $(x_0, y_0).$ The derivative measures the change in $z$ as $x$ and $y$ change. Suppose $x$ changes by $\Delta x$ and $y$ by $\Delta y$. Add to your mental image the tangent plane to the graph at the point $(x_0, y_0).$ In the $xy$-plane there is a rectangle with one vertex $(x_0, y_0)$ and the opposite vertex $(x_0+\Delta x, y_0+\Delta y).$ Imagine the image of that rectangle projected on both the surface and the tangent plane. This is some sort of parallelogram.

See that the change in $z$ as when $x$ and $y$ are incremented is the difference $f(x_0+\Delta x, y_0+\Delta y) - f(x,y)$ which is represented by the change in height in the surface when you move from $(x_0, y_0)$ to $(x_0+\Delta x, y_0+\Delta y).$ Also see that the change in $z$ is approximated by the change in height in the tangent plane as you move from $(x_0, y_0)$ to $(x_0+\Delta x, y_0+\Delta y).$ This is the difference in heights of the two opposite corners of the parallelogram.

The change in $z$ is approximated by moving from one corner of the parallelogram to the opposite corner. You can do this in two steps. First walk along the edge in the $x$ direction, until you're above $(x_0+\Delta x, y_0),$ and then walk along the adjacent edge in the $y$ direction until you're above $(x_0+\Delta x, y_0+\Delta y).$ The change in height along the first edge is $f_x \Delta x$. The change in height along the second edge is $f_y \Delta y.$ So the total change is the sum of those to bits.

So you have

$$\Delta z \approx f_x(x_0,y_0) \Delta x+ f_y(x_0,y_0) \Delta y.$$

When you apply limits (or infinitesimals) you get

$$dz = f_x(x_0,y_0) \; dx+ f_y(x_0,y_0) \; dy.$$

If you divide this equation by $dt$ you get the chain rule.

0
On

The multivariable chain rule is a bit more complicated than the single-variable chain rule, and there are several different ways to write it.

One (simple?) way to think about the rule is as follows: suppose you compute the value of a function $z=f(x,y)$ at a specific point $(x_0,y_0)$. Now you increase the value of $x$ by a small increment $\Delta x$, and at the same time you increase the value of $y$ by a (different) small increment $\Delta y$. How does $z$ change in response?

If only the x-value changed, then there would be a change in the value of $z$ given by $$\Delta z \approx \frac{\partial z}{\partial x}\bigg|_{(x_0,y_0)} \Delta x$$ On the other hand if only the y-value changed, then there would be a change in the value of $z$ given by $$\Delta z \approx \frac{\partial z}{\partial y}\bigg|_{(x_0,y_0)} \Delta y$$ But if both $x$ and $y$ change, these changes in $z$ combine: $$\Delta z \approx \frac{\partial z}{\partial x}\bigg|_{(x_0,y_0)} \Delta x + \frac{\partial z}{\partial y}\bigg|_{(x_0,y_0)} \Delta y$$

Side note: If you write $\Delta z = z - z_0$, $\Delta x = x - x_0$, etc., then this equation is just the equation of the tangent plane, namely $$z-z_0 = \frac{\partial z}{\partial x}\bigg|_{(x_0,y_0)} (x-x_0) + \frac{\partial z}{\partial y}\bigg|_{(x_0,y_0)} (y-y_0)$$

Okay, now suppose that $x$ and $y$ both depend on some third variable, $t$, so that $x=x_0$ and $y=y_0$ at some specific time $t_0$. Then if $t$ increases by a small amount, we would have $$\Delta x \approx \frac{dx}{dt}\bigg|_{t_0} \Delta t$$ $$\Delta y \approx \frac{dy}{dt}\bigg|_{t_0} \Delta t$$

Now put this all together: When $t$ changes by a small increment $\Delta t$, we have $$\Delta z \approx \frac{\partial z}{\partial x}\bigg|_{(x_0,y_0)} \frac{dx}{dt}\bigg|_{t_0} \Delta t + \frac{\partial z}{\partial y}\bigg|_{(x_0,y_0)} \frac{dy}{dt}\bigg|_{t_0} \Delta t$$ or, dividing by $\Delta t$, $$\frac{\Delta z}{\Delta t} \approx \frac{\partial z}{\partial x}\bigg|_{(x_0,y_0)} \frac{dx}{dt}\bigg|_{t_0} + \frac{\partial z}{\partial y}\bigg|_{(x_0,y_0)} \frac{dy}{dt}\bigg|_{t_0}$$

Now in the usual fashion, we take the limit as $\Delta t \to 0$; the difference quotient on the left becomes an ordinary single-variable derivative, and the approximation becomes exact: $$\frac{dz}{dt} =\frac{\partial z}{\partial x}\bigg|_{(x_0,y_0)} \frac{dx}{dt}\bigg|_{t_0} + \frac{\partial z}{\partial y}\bigg|_{(x_0,y_0)} \frac{dy}{dt}\bigg|_{t_0}$$