Find the gradient $\nabla(g\circ f)$ in all points $(x,y)\in\mathbb{R}^2$.

385 Views Asked by At

Let $f:\mathbb{R}^2\rightarrow\mathbb{R}^2, f(x,y)=(xy,x^2+y^2)$, $g:\mathbb{R}^2\rightarrow\mathbb{R},g(t,s)=\exp(2t+s)$. Find the gradient $\nabla(g\circ f)$ in all points $(x,y)\in\mathbb{R}^2$ once by taking the partial derivative of $g\circ f$ and once by making use of the gradient chain rule: $\nabla(g\circ f)(x_0)=\nabla g(f(x_0))\nabla f(x_0)$.


First taking the partial derivative. Let $h:=(g\circ f)$. Then \begin{align*} \nabla h(x,y)&=\nabla \exp(2xy+x^2+y^2)=(\frac{dh}{dx},\frac{dh}{dy})\\ &=(2(x+y)e^{x^2+2xy+y^2},2(x+y)e^{x^2+2xy+y^2}). \end{align*}


But the chain rule is confusing me. It should be $\nabla(g(f(x,y)) \nabla f(x,y)=\nabla g(xy,x^2+y^2)\nabla f(x,y)$, right? But isn't $\nabla g(xy,x^2+y^2)$ not the same as what I just did above, i.e. $\nabla g(xy,x^2+y^2)=\nabla \exp(2xy+x^2+y^2)$?

  • Where is my mistake?
  • How would I take the gradient of e.g. $f(x,y)$ in the first place? Do I not get a three dimensional vector?
2

There are 2 best solutions below

4
On

No, calculating $\nabla g(xy,x^2+y^2)$ gives not what you've calculated before. The main point is that chain rule can be confusing, when you interpret the formula wrong. With the chain rule you now apply the nabla operator to the variables $t,s$ for $g$. $\nabla g(xy,x^2+y^2)$ means that you calculate $\nabla g(t,s)$ and insert $xy$ for $t$ as well as $x^2+y^2$ for $s$. And for $f$: Since $f$ is a vectorial function, you get here a Jacobian matrix. So the chain rule tells you in your case $$ \nabla (g \circ f)(x,y)^T=\nabla g(xy,x^2+y^2)^T Df(x,y)$$ which is nothing else then a vector matrix multiplication (vector from the left). In general the chain rule tells you $$D(g \circ f)(x,y)=Dg(f(x,y)) Df(x,y)$$ where these can be Jacobian matrices of different sizes. Most important is to understand $Dg(f(x,y))$, which is the Jacobian matrix of $g$, but evaluated in $f(x,y)$. Note that I use Jacobians for the chain rule here, which is more general. The Jacobian of scalar functions gives the gradient as a row vector.

0
On

There is a difference between $\nabla f(g) = (\nabla f)(g) = \nabla f \circ g $ and $\nabla(f(g)) = \nabla (f\circ g)$. This is more confusing when the slight abuse of notation $f=f(x)$ is used.

The gradient $\nabla f(v)$, if interpreted as the matrix in the 'usual standard basis' of the derivative of $f$ at $v$, is also known for functions $f:\mathbb R^2 \to \mathbb R^2$ as the Jacobian matrix of $f$ at $v$.