How are both of these true: $J = \nabla f ^T $, and also $\nabla f = J^T f$?

388 Views Asked by At

From questions such as this one: Gradient and Jacobian row and column conventions I understand that for cases where $f$ maps from $\mathbb{R}^n$ into $\mathbb{R}$ , i.e. $f: \mathbb{R}^n \rightarrow \mathbb{R}$, the transpose of the gradient is equal to the jacobian: $J = \nabla f ^T $. Again, see Gradient and Jacobian row and column conventions as my resource.

However, I am still occasionally confused by this, because when finding an expression of the gradient for when $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ I see expressions such as $\nabla f = J^T f$. An example of this is in Nocedal and Wright, first edition on page 260:

Nocedal and Wright 1st edition page 260

Question is how are both of these true: $J = \nabla f ^T $, and also $\nabla f = J^T f$ ?

3

There are 3 best solutions below

2
On

If $A=B^T$, then $B=A^T$. It is simply a consequence of the fact that ${\left(A^T\right)}^T=A$.

2
On

They can both be true because the $f's$ and corresponding Jacobians are different. Nonlinear least squares has its own notation and conventions for what the Jacobian is (is applied to, namely to residual functions which are squared and summed and multiplied by 1/2).

I am looking at the 2nd edition of Nocedal and Wright, whereas you must apparently be looking at the 1st edition. Perhaps there is a typo in that ediition uding f where there should have been an r (see next paragraph).

In the Nocedal and Wright extract pertaining to a nonlinear least squares problem, f = 1/2 of sum squared residuals = $\frac{1}{2}\Sigma_{i=1}^nr_i^2$, where $r_i$ are the individual residual functions. The Jacobian J, in this nonlinear least squares context, is the matrix of partial derivatives of $r_i$ with respect to variable $x_j$, o.e., the ith row of $J$ is the transpose of the gradient of $r_i$. Then it works out that gradient of f = $J^Tr$, where $r = $ column vector of $r_i's$ So this is true under these definitions and conventions, which differ from life outside nonlinear least squares.

0
On

First, let's clarify the notation first. To be precise, the notation for Jacobian of $f$ is $J_f$, and the notation for the gradient is $\nabla f^T$.

Then, regarding $f=(f_1,\dots,f_m)=f(x)=(f_1(x), \dots, f_m(x))$ as a $m$-dimensional coloum vector, $$ J_f = \begin{bmatrix} \nabla f_1^T \\ \cdots \\ \nabla f_m^T \end{bmatrix} $$ where each $\nabla f_i$ is a column vector that consist of partial derivatives in the conventional way.

On the other hand, to consistently interpret $\nabla f^T$, first regard $f^T$ as a row vector $f^T = [f_1, \dots, f_m]$. Then, $$ \nabla f^T = [\nabla f_1, \dots, \nabla f_m]. $$ What we really have is $J_f^T = \nabla f^T$ and $J_f = (\nabla f^T)^T$.

$\nabla f$ cannot be defeind in a consistent way onece we regard $f$ as a column vector and $\nabla$ as an operation applied to a scalr function to produce a column vector.