How does this version of the multivariable chain rule work?

1k Views Asked by At

In my calculus book, it gives these formulas and states them as versions of the multivariable chain rule. I do not see how they make sense as if I cancel out the partial x’s, I get 1 = 2. Can someone explain this and tell me how these formulas work?

$\partial f/\partial v = \partial f/\partial x \cdot dx/dv + \partial f/\partial y \cdot dy/dv$

$\partial f/\partial u = \partial f/\partial x \cdot dx/du + \partial f/\partial y \cdot dy/du$

Note: These formulas are for partial derivatives of functions of form $f(x(u,v),y(u,v))$. Also please try to explain intuitively and not too rigourously.

3

There are 3 best solutions below

6
On BEST ANSWER

Forget about cancelling $dx$'s, this only "works" in the single-variable case. Think of it like this: if you have something like $g(u) = f(x(u))$, then $$\frac{dg}{du} = \frac{dx}{du}\frac{df}{dx},$$right? Think of $df/dx$ being a contribuition to $dg/du$, with weight $dx/du$. In the multivariable case, each partial derivative of $f$ will give a contribuition, with a certain weight. For example: if $g(u) = f(x(u),y(u),z(u),w(u))$, then $$\frac{dg}{du} = \frac{dx}{du}\frac{\partial f}{\partial x} + \frac{dy}{du}\frac{\partial f}{\partial y}+\frac{dz}{du}\frac{\partial f}{\partial z}+ \frac{dw}{du}\frac{\partial f}{\partial w}.$$

In the situations like $g(u,v) = f(x(u,v),y(u,v),z(u,v))$ you'll use the same principle, but the weight will be "with respect to the variable you are differentiating". Meaning $$\frac{\partial g}{\partial u}=\frac{\partial x}{\partial u}\frac{\partial f}{\partial x} + \frac{\partial y}{\partial u}\frac{\partial f}{\partial y}+\frac{\partial z}{\partial u}\frac{\partial f}{\partial z}.$$Similarly for $\partial g/\partial v$, etc.

0
On

They don’t cancel out you must threat them as product of derivatives, for instance

$$\frac{\partial f}{\partial v} = \frac{\partial f}{\partial x} \cdot \frac{\partial x}{\partial v} + \frac{\partial f}{\partial y} \cdot \frac{\partial y}{\partial v}$$

$$\frac{\partial f}{\partial u} = \frac{\partial f}{\partial x} \cdot \frac{\partial x}{\partial u} + \frac{\partial f}{\partial y} \cdot \frac{\partial y}{\partial u}$$

To better understand the concept you should consider that chain rule is just obtained by matrix product of gradient and/or jacobians.

In the example given:

$$\nabla f(u,v)= \begin{bmatrix}f_u\\f_v\end{bmatrix}= \begin{bmatrix}x_u&y_u\\x_v&y_v\end{bmatrix}\cdot \begin{bmatrix}f_x\\f_y\end{bmatrix}$$

3
On

If you have a multivariable function $\pmb f = (f_1,\dots,f_n) : \mathbf{R}^m \to \mathbf{R}^n$ then the derivative is the matrix

$$ D\pmb f = \left( \frac{\partial f_i}{\partial x_j} \right) =\begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_m} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_m} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_m} \\ \end{pmatrix} $$

This matrix represents a linear transformation $\mathbf{R}^m \to \mathbf{R}^n$ . If you have a composition of functions $\mathbf{R}^m \xrightarrow{\pmb f} \mathbf{R}^n \xrightarrow{\pmb g} \mathbf R^p$ then the chain rule says that the derivative of the composition $\pmb g \circ \pmb f$ is the composition of these linear functions which is given by matrix multiplication:

$$ D(\pmb g \circ \pmb f) = D\pmb g \circ D \pmb f = \left( \frac{\partial g_i}{\partial u_j} \right) \left( \frac{\partial f_j}{\partial x_k} \right). $$

The formula comes from computing this matrix product. That is

$$ D(\pmb g \circ \pmb f) = \left( \frac{\partial (\pmb g \circ \pmb f)_i}{\partial u_k} \right) $$

where

$$ \frac{\partial (\pmb g \circ \pmb f)_i}{\partial x_k} = \sum_{j = 1}^n \frac{\partial g_i}{\partial u_j} \frac{\partial f_j}{\partial x_k}. $$

Since we are interpreting $f_j$ as the input $u_j$, it is common to abuse notation and write this as

$$ \frac{\partial (\pmb g \circ \pmb f)_i}{\partial x_k} = \sum_{j = 1}^n \frac{\partial g_i}{\partial u_j} \frac{\partial u_j}{\partial x_k}. $$

For example, let $\pmb f : \mathbf{R} \to \mathbf{R}^2$ be given by $\pmb f(x) = (p(x), q(x))$ and let $\pmb g : \mathbf{R}^2 \to \mathbf{R}$ be given by $g(a,b) = ab$. Then $g(f(x)) = g(p(x),q(x)) = p(x)q(x)$. By our formula,

$$ \frac{\partial(g\circ f)}{\partial x} = \frac{\partial g}{\partial a} \frac{\partial p}{\partial x} + \frac{\partial g}{\partial b} \frac{\partial q}{\partial x} = bp'(x) + aq'(x) = q(x)p'(x) + p(x)q'(x). $$

because $b = q(x)$ and $a = p(x)$. This is the familiar product rule.