Why we can approximate $H(l)$ by $2J(f)^TJ(f)$ for $l=\lVert f(w)\rVert^2=f(w)^Tf(w)$ in the context of finding ${\rm argmin}_wf(w)$?

47 Views Asked by At

Why we can approximate $H(l)$ by $2J(f)^{T}J(f)$ for $l=\lVert f(w)\rVert^2=f(w)^Tf(w)$ in the context of finding ${\rm argmin}_wf(w)$?

It's easy to show that $\frac{dl}{d\textbf{w}} = 2J(f)^{T}f$. However, I can't follow the model answer for showing that $H(l) = 2J(f)^{T}J(f)$.

It states "some routine calculations give $H(f) = 2J(f)^{T}J(f)+2\sum_{i}f_{i}H(f_{i})$, where $f_i$ are the $m$ components of $f$, and then use that $f_i$ should all be close to zero near the minimum."

I don't see how the rules of vector/matrix calculus can lead to this equation, and where and how do $H(f_{i})$ come into play.

1

There are 1 best solutions below

0
On

$ \def\E{{\cal E}}\def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3}} $Assume that the Jacobian is known and use this to expand the differential of $f$ $$\eqalign{ \grad{f}{w} &= J \quad&\implies\quad df = J\cdot dw \\ }$$ Now calculate the differential and gradient of $\ell$
(using explicit dot products for clarity) $$\eqalign{ \ell &= f\cdot f \\ d\ell &= 2f\cdot df = 2f\cdot\LR{J\cdot dw} = \LR{2J^T\cdot f}\cdot dw \\ \grad{\ell}{w} &= 2J^T\cdot f \quad\doteq\quad g \\ }$$ and thence the approximate differential and gradient of $g,\,$ i.e. the Hessian
(under the assumption that $f\approx 0$) $$\eqalign{ dg &= \LR{2J^T\cdot df + 2\,dJ^T\cdot f} \;\approx\; \LR{2J^T\cdot J\cdot dw} + 0 \\ \grad{g}{w} &\approx 2J^T\cdot J \\ }$$