Show $\nabla f=A\nabla g$ by chain rule

97 Views Asked by At

Let $A$ be a $2\times 2$ an invertible matrix, $f:\mathbb R^2\to\mathbb R$ be smooth, and define $g:\mathbb R^2\to \mathbb R$ by $$g(Ax)=f(x),$$ i.e., if I set the variable of $g$ as $y$, $g(y)=f(A^{-1}y).$

Then, show $\nabla f=A\nabla g$ using chain rule.


Where do I mistake ?

Let $D$ represent the derivative.

\begin{align} D_y g(y) &=D_y(f(A^{-1}y))\\ &=(D_x f)(A^{-1}y)\cdot D_y(A^{-1}y)\\ &=(D_x f)(x)\cdot A^{-1} , \end{align} and since $D_y g(y)=\nabla g(y)$, $(D_x f)(x)=\nabla f(x)$, I get $$\nabla g=\nabla f\cdot A^{-1}$$ and then $$\nabla f=(\nabla g)A.$$

2

There are 2 best solutions below

0
On BEST ANSWER

Note that the identity you were asked to prove is false unless $A$ is symmetric. For example try $g(x,y) = x+y$ and $A = \begin{bmatrix}1 & 1\\0 & 1\end{bmatrix}$.

Then $f(x,y) = g(x + y, y) = x + 2y$ and $\nabla f = \left[1, 2\right]^T$. Since $\nabla g = \left[1,1\right]^T$, notice that $\nabla f = A^T \nabla g \neq A \nabla g$.


The following: $$D_yg(y) = \nabla g(y)$$ is not correct, though it's a very common confusion. The left-hand side is the covariant Jacobian/differential, and is $1 \times 2$ row vector. The right-hand side is the contravariant gradient vector and is $2\times 1$ column vector.

They are related by the definition of the gradient: $\nabla g$ is the unique vector which satisfies $$\langle \nabla g, v\rangle = \left[Dg\right]v$$ for every vector $v$. For the standard Euclidean inner product it follows that $\nabla g = [Dg]^T$.

The correct conclusion of your proof is thus $$ \begin{align*} D_y g &= \left[D_x f\right] A^{-1}\\ \left[D_y g\right] A &= D_x f\\ A^T \nabla g &= \nabla f. \end{align*} $$

0
On

$ \def\n{\nabla} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\qquad\implies\qquad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} $Define the vector $w=A\cdot x$ and note that $A$ does not need to be invertible; it might even be rectangular.

Given a scalar quantity that can be expressed as a function of either $\,x$ or $w$ $$\eqalign{ h &= f(x) = g(w) \\ }$$ first calculate its differential using the $g(w)$ function $$\eqalign{ dh &= \gradLR gw \cdot dw \;=\; \gradLR gw \cdot \LR{A \cdot dx} \;=\; \CLR{A^T \cdot \grad gw} \cdot dx \\ }$$ then using the $f(x)$ function $$\eqalign{ dh &= \c{\gradLR fx} \cdot dx \qiq \grad fx = A^T \cdot \grad gw }$$


In the above derivation, $(a\cdot b)$ denotes the dot product.