Composite function gradient

244 Views Asked by At

Suppose I have smooth maps $g:R^m \rightarrow R^n$ and $f:R^n \rightarrow R$. Then I think $$\nabla (f \circ g) = (\nabla g)^T (\nabla f) \circ g,$$ where $\nabla g$ is the $n \times m$ Jacobian matrix. This is the only plausible answer that makes the dimension work out, and if I use row vectors instead of columns, it nicely resembles the univariate chain rule: $$\nabla (f \circ g) = ((\nabla f) \circ g) (\nabla g).$$ But can anyone prove it? And is there some way this can work with column vectors that doesn't involve transposing the Jacobian? Thanks!

1

There are 1 best solutions below

0
On BEST ANSWER

It is convenient to approach this kind of problems with a more general and convenient rule.

Using the chain rule for differentials $d(f\circ g)(x, dx)=df(g(x),dg(x,dx))$: $$d(f\circ g)(x, dx)=\nabla (f\circ g)\cdot dx=df(g(x),dg(x,dx))=(\nabla f)\circ g\cdot dg(x,dx)=$$ $$=(\nabla f)\circ g\cdot\nabla gdx=\operatorname{tr}([(\nabla f)\circ g]^T\nabla gdx)=(\nabla g)^T (\nabla f)\circ g\cdot dx$$ $\cdot$ is the dot product, also i have used the property of trace $\operatorname{tr}(A^TB)=A\cdot B$.

So we have from the derivation above that $$\nabla (f\circ g)=(\nabla g)^T (\nabla f)\circ g$$

Proof sketch for the chain rule above:

Using the definitions of differentials $$f\circ g(x+h)=f\circ g(x)+d(f\circ g)(x, h)+o(||h||)$$ $$g(x+h)=g(x)+dg(x,h)+o(||h||)$$ Putting them in the composition $$f\circ g(x+h)=f(g(x)+dg(x,h)+o(||h||))=$$ $$f\circ g(x)+df(g(x),dg(x,h)+o(||h||))+o(||dg(x,h)+o(||h||)||)=$$ $$=f\circ g(x)+df(g(x),dg(x,h))+df(g(x),o(||h||))+o(||dg(x,h)+o(||h||)||)=$$ $$=f\circ g(x)+df(g(x),dg(x,h))+o(||h||)$$ because $df(g(x),o(||h||))+o(||dg(x,h)+o(||h||)||) \in o(||h||)$, i omit here tedious epsilon-delta manipulations to prove this fact, but it can be shown using continuity and linearity of $df, dg$ in $h$.

So from the first formula and the last and from the uniqueness of differential we have $$d(f\circ g)(x, h)=df(g(x),dg(x,h))$$