Multivariate Chain rule, why addition?

201 Views Asked by At

enter image description here

Lets consider the above graph. Now $\left[ \frac{\delta h}{\delta f},\,\frac{\delta h}{\delta g} \right]$ is the Jacobian. $\frac{\delta h}{\delta f}$ is the magnitude of vector along $f(x)$ and $\frac{\delta h}{\delta g}$ is the magnitude of vector along $g(x)$. So, $$\sqrt{{{\left( \frac{\delta h}{\delta f} \right)}^{2}}+\,\,{{\left( \frac{\delta h}{\delta g} \right)}^{2}}}$$ gives the magnitude of vector of steepest change of $h( f(x), g(x))$, which is the estimate of the change of $h$ w.r.t. $f$ and $g$.

Now if I nudge $x$ the jacobian $\left[ \frac{\delta h}{\delta f},\,\frac{\delta h}{\delta g} \right]$ will change i.e. $\frac{\delta h}{\delta f}\frac{\delta f}{\delta x}$ and $\frac{\delta h}{\delta g}\frac{\delta g}{\delta x}$. These show how the contents of the Jacobian will change w.r.t. $x$. Now I consider the diagram same as this one enter image description here

So, $$\left[ \frac{\delta h}{\delta f}\frac{\delta f}{\delta x},\,\,\frac{\delta h}{\delta g}\frac{\delta g}{\delta x} \right]$$ can be considered as new Jacobian and the estimate of change in $h$ w.r.t $x$ will be calculated just by finding root of sum of squares of the elements in the Jacobian. But the formula says $$\frac{dh}{dx}=\frac{\delta h}{\delta f}\frac{\delta f}{\delta x}+\frac{\delta h}{\delta g}\frac{\delta g}{\delta x}$$

I am trying to understand the chain rule w.r.t. estimating the change in $h$ w.r.t. $x$. I am sure I have lots of misconception but will be glad if you point that out.

1

There are 1 best solutions below

0
On

The problem occurs where you "nudge" the Jacobian. Your $\delta h/ \delta f$ is still a function of both $f$ and $g$ (Example: $f(x,y)=x^2y$ then $\delta f/\delta x = 2xy$ depends on $x$ and $y$). So when you nudge the Jacobian the way $\delta h/ \delta f$ changes is more complicated (in fact it involves the chain rule!).

Why addition?

Consider $f(x,y)$. The gradient $\nabla f = [ f_x \ \ f_y ]$ is the derivative of $f$ in the sense that it encodes the first order changes in $f$. In fact, the linearization of $f$ at $(a,b)$ is $$F(x,y) = f(a,b) + [f_x(a,b) \ \ f_y(a,b) ] \begin{bmatrix} x-a \\ y-b \end{bmatrix} = f(a,b) + f_x(a,b)(x-a) + f_y(a,b)(y-b)$$

What happens if we substitute in new variables like $x=g(u,v)$ and $y=h(u,v)$?

Suppose that $a=g(c,d)$ and $b=h(c,d)$. Linearize $g$ and $h$ at $(c,d)$ (call these $G$ and $H$). Then we get $G(u,v) = g(c,d) + g_u(c,d)(u-c)+ g_v(c,d)(v-d)$ and $H(u,v) = h(c,d)+h_u(c,d)(u-c)+h_v(v-d)$.

Now feed the linearizations $G$ and $H$ into the linearization $F$:

$F(G(u,v),H(u,v))$

$$= f(G(c,d),H(c,d)) + f_x(G(c,d),H(c,d))(G(u,v)-G(c,d)) + f_y(G(c,d),H(c,d))(H(u,v)-H(c,d))$$ $$ = f(a,b) + f_x(a,b)(g(c,d) + g_u(c,d)(u-c)+ g_v(c,d)(v-d)-g(c,d)) + f_y(a,b)(h(c,d)+h_u(c,d)(u-c)+h_v(v-d)-h(c,d))$$ $$ = f(a,b) + (f_x(a,b)g_u(c,d)+f_y(a,b)h_u(a,b))(u-c) + (f_x(a,b)g_v(c,d)+f_y(a,b)h_v(c,d))(v-d)$$

In other words, the linearization of the composition $f(g(u,v),h(u,v))$ has partials: $f_u = f_xx_u+f_yy_u$ and $f_v = f_xx_v+f_yy_v$. The chain rule!!

Really what's going on is that the derivative of a function (= Jacobian matrix) exists to help encode linearizations. Linearizing a composition of functions results in composing the linearizations of the functions being composed. Since composition of linear things is encoded by matrix multiplication, we get that the Jacobian matrix of $m \circ n$: $J_{m \circ n}$ is the product of the Jacobian matrices of $m$ and $n$: $J_m$ and $J_n$.

So the chain rule says: $J_{m \circ n} = J_m J_n$. In other words, in the linearized world, function composition is matrix multiplication. :)