Can some one explain how he derived the chain rule?

87 Views Asked by At

I have been reading this:

https://thenumb.at/Autodiff/

And I am stuck at the Chain rule part.

The definitions:

enter image description here

I have highlighted in read the terms I don't understand below:

enter image description here

If you scroll up just a few lines it says this:

enter image description here

"h" and "f" are exactly the same functions.

They map 2 values to 2 output values. The why the derivative of "f" is a 2 by 1 vector and the derivative of h is a 2 by 2 matrix ?

I assume that $x = (x_{1}, x_{2})$ so it is just a matter of doing the same for f but with $(x_{1}, x_{2})$ instead of $(x, y)$ ?

1

There are 1 best solutions below

13
On

The website you took this from is just wrong. There isn't even a $g_1$ and a $g_2$ as $g$ maps to $\mathbb{R}$. The correct formula for $h = g \circ f$ is $$J_h(x,y) = J_g(f(x,y))J_f(x,y) = \begin{pmatrix} \frac{\partial g}{\partial x}(f(x,y)) & \frac{\partial g}{\partial y}(f(x,y)) \end{pmatrix}\begin{pmatrix} \frac{\partial f_1}{\partial x}(x,y) & \frac{\partial f_1}{\partial y}(x,y) \\ \frac{\partial f_2}{\partial x}(x,y) & \frac{\partial f_2}{\partial y}(x,y) \end{pmatrix} = \begin{pmatrix} \frac{\partial g}{\partial x}(f(x,y)) \frac{\partial f_1}{\partial x}(x,y) + \frac{\partial g}{\partial y}(f(x,y)) \frac{\partial f_2}{\partial x}(x,y) & \frac{\partial g}{\partial x}(f(x,y)) \frac{\partial f_1}{\partial y}(x,y) + \frac{\partial g}{\partial y}(f(x,y)) \frac{\partial f_2}{\partial y}(x,y)\end{pmatrix},$$ where $J$ denotes the Jacobian. I'd advise to stay away from this site as they also seem to be confusing gradients with Jacobians as well as introducing the bad habit of using $f(x)$ as the name of a function instead of the function value at $x$.