I am estimating a model minimizing the following objective function,
$ M(\theta) = (Z'G(\theta))'W(Z'G(\theta)) \equiv G(\theta)^T Z W Z^T G(\theta)$
$Z$ is an $N \times L$ matrix of data, and $W$ is an $L\times L$ weight matrix, neither of which depends on $\theta$. $G(\theta)$ is a function which takes the $K \times 1$ vector of parameters I am estimating into an $N \times 1$ vector of residuals.
I am trying to calculate the Gradient and the Hessian of M to feed to into a Matlab solver ($G(\theta)$ is highly nonlinear). I believe I have the gradient correctly calculated as follows. Let $J = dG/d\theta$ be an $N \times K$ Jacobian matrix, where $J_{i,k}$ is the derivative of element $i$ of $G$ with respect to parameter $k$. Then the gradient vector of M is,
$ \nabla M = 2 (Z'J(\theta))'W (Z'G(\theta))' $
However, I cannot figure out how to take the derivative of this gradient to get the Hessian. I can calculate the vectors of derivatives of each element of $J(\theta)$, but I'm not sure what the order of that derivative matrix should be. I see that I basically need to divide this function up into two parts and use a the product rule, but cannot figure out what the derivative of each part should look like.
Any help would be greatly appreciated. Thank you.
I was having similar issues with an equation as the second derivative evaluation was not possible.
Numerical Optimization by J Nocedal & S Wright
And The Levenberg-Marquardt Algorithm by Ananth Ranganathan
suggest that the Hessian and Gradient can be approximated rather than directly evaluated. The Hessian can be approximated as the transpose of the Jacobian multiplied by the Jacobian itself. The Gradient can be approximated by the transpose of the Jacobian multiplied by the Residuals.
I hope this is some use to you or points you in a more helpful direction.