Hessian of the norm of a non-linear map

199 Views Asked by At

Suppose $F: \mathbb{R}^n \rightarrow \mathbb{R}^m$ and define the scalar valued map $\Phi(x;y) = \frac{1}{2}\|y - F(x)\|_2^2 $. I am interested in the Hessian of this map written in terms of the second (and first) derivatives of $F$. Namely write $DF: \mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R}^n, \mathbb{R}^m)$ for the Jacobian of $F$ and $DF^2: \mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R}^n, \mathcal{L}(\mathbb{R}^n, \mathbb{R}^m))$ for the second derivative. The gradient of $\Phi$ which I will denote $D \Phi: \mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R^n, \mathbb{R}}) \simeq \mathbb{R}^n$, I can compute as $$D \Phi(x) = DF^T (x)[F(x) - y]$$ but now how do I compute the Hessian $D \Phi^2: \mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R}^n, \mathcal{L}(\mathbb{R}^n, \mathbb{R})) \simeq \mathcal{L}(\mathbb{R}^n, \mathbb{R}^n)$ in terms of $DF$ and $DF^2$? I'm assuming it should be something like $$D\Phi^2(x) = DF^T (x) DF(x) + DF^2(x)[\cdot,\cdot]$$ but I'm not sure what goes into $[\cdot,\cdot]$. Thanks.

1

There are 1 best solutions below

2
On

It's much easier to do this sort of thing using the Taylor expansion idea, i.e. that $$ \Phi(a+h) = \Phi(a) + D\Phi(a)(h) + D^2\Phi(a)(h,h) + o(\lVert h \rVert^2): $$ if you can create an expansion of this form, you can read off $D^2\Phi$. In this case, $$ \Phi(x+h;y+k) = \lVert y+k-F(x+h) \rVert_2^2 = \lVert y+k-F(x)+DF(x)(h)+D^2F(x)(h,h) + o(\lVert h \rVert^2) \rVert_2^2 \\ = \lVert y-F(x) \rVert_2^2 + 2(y-F(x))^T (k+DF(x)(h)+D^2F(x)(h,h) + o(\lVert h \rVert^2)) + \lVert k+DF(x)(h)+D^2F(x)(h,h) + o(\lVert h \rVert^2) \rVert_2^2 \\ = \Phi(x;y) + 2(y-F(x))^T (k+DF(x)(h)) + 2(y-F(x))^T D^2F(x)(h,h) + \lVert k + DF(x)(h) \rVert_2^2 + o( \lVert k+h \rVert^2 ), $$ from which we read off $$ D\Phi(h;k) = 2(y-F(x))^T (k+DF(x)(h)), \\ D^2\Phi((h;k),(h;k)) = 2(y-F(x))^T D^2F(x)(h,h) + \lVert k + DF(x)(h) \rVert_2^2. $$ While it is possible to simplify this, there are limited benefits to doing so, since these expressions retain the linear maps in forms that keep their arguments in the right places, rather than trying to twist everything to behave like a matrix of some kind.