Nonlinear Optimization: Explain how to differentiate a norm of a nonlinear function using matrix algebra

115 Views Asked by At

I am learning about nonlinear optimization where I have some data vector, $\mathbf{d}$ and some unknown model $\mathbf{m}$. I can calculate predicted data from some model using a nonlinear function $\mathbf{F}(\mathbf{m})$.

The standard method to solve this nonlinear problem is to minimize some cost function

$$U = ||\mathbf{d}-\mathbf{F}(\mathbf{m})||_2^2 +\mathrm{Some \;Regularization}$$

where $||\cdot||_2 $ is the L2 norm.

The way that this is done is to differentiate $U$ and set the derivative to 0:

$$\frac{\partial U}{\partial \mathbf{m}} = 0$$

The solution is then often stated as:

$$0 = \mathbf{J^Td}-\mathbf{J^T Jm} + \mathrm{Some \; Regularization\; Stuff}$$

where $\mathbf{J}$ is the Jacobian.

Can someone explain where the Jacobian and its transpose come from step-by-step?