How to prove identity $\nabla f(x) = \nabla F(x)*F(x)$ for 2-norm squared when chain rule says otherwise?

313 Views Asked by At

I know that the chain rule says that $f(x)=h(g(x))$ and if we want the gradient then $\nabla f = \nabla h*g'(x)$.

I am trying to prove the identity:

$\nabla f(x) = \nabla F(x)*F(x)$ when $f(x)=\frac{1}{2}||F(x)||^2$ and $F:\mathbb{E_1}\rightarrow\mathbb{E_2}$ where $\mathbb{E}$ is a Euclidean space and $F$ is a $C^1$ smooth mapping.

I am struggling with this because it seems to me like based on the definition for chain rule that I said above then wecan say $f(x)=h(g(x))$, and $h(y)=1/2||y||^2$ and $g(x)=F(x)$. Then obviously $\nabla h=F(x)$, but then how do I get $\nabla F$ for the next part?

question: Unless we are able to assume that $\nabla F=g'(x)$, and hence also assume that $g'(x)=F'(x)$ then how can I get this identity to work out?

and then in general when can I assume that the derivative equals the gradient? I was told to not assume that in general. thanks.

1

There are 1 best solutions below

8
On

When in doubt, use coordinates.

If $F\colon\mathbb R^n \to \mathbb R^m$, we can write $F(x) = (F_1(x),\ldots, F_m(x))$, where $F_i\colon \mathbb R^n\to\mathbb R$. Also, $\|F(x)\|^2 = \sum_i F_i(x)^2$. Then we have

$$\frac{\partial f}{\partial x_j}(x) = \frac\partial{\partial x_j}(\frac 12 \sum_i F_i(x)^2) = \sum_i F_i(x)\frac{\partial F_i}{\partial x_j}(x) = [\,F_1(x)\,\ F_2(x)\ \ldots \ F_m(x)\,] \begin{bmatrix} \frac{\partial F_1}{\partial x_j}(x)\\ \frac{\partial F_2}{\partial x_j}(x)\\ \vdots\\ \frac{\partial F_m}{\partial x_j}(x) \end{bmatrix} $$

therefore,

$$\nabla f(x) = [\,F_1(x)\,\ F_2(x)\ \ldots \ F_m(x)\,] \begin{bmatrix} \frac{\partial F_1}{\partial x_1}(x) & \frac{\partial F_1}{\partial x_2}(x) &\ldots & \frac{\partial F_1}{\partial x_n}(x)\\ \frac{\partial F_2}{\partial x_1}(x) & \frac{\partial F_2}{\partial x_2}(x) &\ldots & \frac{\partial F_2}{\partial x_n}(x)\\ \vdots & \vdots & & \vdots\\ \frac{\partial F_m}{\partial x_1}(x) & \frac{\partial F_m}{\partial x_2}(x) &\ldots & \frac{\partial F_m}{\partial x_n}(x) \end{bmatrix}.$$

Here we identify vector in $\mathbb R^m$ with $1\times m$ matrix.

What might be confusing is that we are often identifying vectors with matrices (either $n\times 1$ or $1\times n$) and so the RHS of the chain rule might be matrix multiplication, inner product or action of linear operator on vector depending on particular identifications. I know it confused me a lot. But!

When in doubt, use coordinates.