Rewriting the Jacobian matrix with nabla notation.

1k Views Asked by At

To answer the following using notion of matrix and vector size:

enter image description here

I did a matrix size analysis: since $g(f)$ is 1 x m in matrix notation and Jacobian f is m x n size then the vector product of both is 1 x n size wich is the same size as the gradian of g(f) since it is in row vector form here 1 x n.

Now the answer given to the exersie used the Jacobian of a composition formula. What I have trouble with is to me I though Jacobian of f = f*nabla transpose ( $f\bigtriangledown ^{T} = J_{f} $) but here its $J_{g\circ f}= \bigtriangledown (g\circ f)$ and $= J_{g}= \bigtriangledown g(f)$

So I am confused. Is the Jacobian of a function $ J_{ f}= f\bigtriangledown^{T} $ or $J_{ f}= \bigtriangledown f$

enter image description here

1

There are 1 best solutions below

4
On

The Jacobian $J_{f}(a)$ is the derivative of a function $f:\mathbb{R}^n \supset U \to \mathbb{R}^m$. In other words, it is a linear approximation to $f$ at a point. So it takes vectors in the tangent space of the domain, $\mathbb{R}^n$, to the tangent space of the range, $\mathbb{R}^m$. It is therefore an $m \times n$ matrix.

If we have a function $g:\mathbb{R}^m \supset V \to \mathbb{R}$, its Jacobian is a linear functional $\mathbb{R}^m \to \mathbb{R}$, i.e. a $1 \times m$ matrix. The gradient is normally taken to be the vector dual to this linear functional: the unique vector $\nabla f(x)$ so that $J_g(a)v = (\nabla f(a)) \cdot v $. (It is not normal to take the gradient as a linear functional/row vector: we see that the corresponding "row vector" is the map $v \mapsto (\nabla f(a)) \cdot v$. See e.g. here for a summary of the usual convention.)

If we have a composition of maps, the chain rule in coordinates says that $$ D(F \circ G)(a)_{ij} = \sum_k \frac{\partial F_i}{\partial G_k} \frac{\partial G_k}{\partial x_j} = (DF(G(a)) DG(a))_{ij} $$ where the product is composition of linear maps. In terms of the Jacobians, $J_{F \circ G}(a) = J_F(G(a))J_G(a)$, the product being matrix multiplication.


For this specific problem, $g \circ f : \mathbb{R}^n \to \mathbb{R}$, so $D(f \circ g)(a)$ is a linear functional and $$ D(g \circ f)(a)v = Dg(f(a)) Df(a)v. $$ Translating this into Jacobians, $ J_{g \circ f}(a)v = J_g(f(a)) J_f(a)v $, and then converting the linear functionals to dot products with vectors, $$ \nabla(g \circ f)(a) \cdot v = \nabla g(f(a)) \cdot J_f(a)v = \big( \nabla g(f(a)) \cdot J_f(a) \big) \cdot v. $$ In coordinates, this looks like $[\nabla(g \circ f)(a)]_i = \sum_k [ \nabla g(f(a)) ]_k [ J_f(a) ]_{ki}$; one could write this as $\big( (J_f(a))^T \nabla g(f(a)) \big)^T$, but this rather obscures its origins as a derivative.


You certainly shouldn't write $f\nabla^T$: it's not at all clear what this would mean. $\nabla f$ is a single object, the vector dual to $Df$.