How to calculate the hessian of a matrix function $G(\theta)^T Z W Z^T G(\theta)$

3.3k Views Asked by At

I am estimating a model minimizing the following objective function,

$ M(\theta) = (Z'G(\theta))'W(Z'G(\theta)) \equiv G(\theta)^T Z W Z^T G(\theta)$

$Z$ is an $N \times L$ matrix of data, and $W$ is an $L\times L$ weight matrix, neither of which depends on $\theta$. $G(\theta)$ is a function which takes the $K \times 1$ vector of parameters I am estimating into an $N \times 1$ vector of residuals.

I am trying to calculate the Gradient and the Hessian of M to feed to into a Matlab solver ($G(\theta)$ is highly nonlinear). I believe I have the gradient correctly calculated as follows. Let $J = dG/d\theta$ be an $N \times K$ Jacobian matrix, where $J_{i,k}$ is the derivative of element $i$ of $G$ with respect to parameter $k$. Then the gradient vector of M is,

$ \nabla M = 2 (Z'J(\theta))'W (Z'G(\theta))' $

However, I cannot figure out how to take the derivative of this gradient to get the Hessian. I can calculate the vectors of derivatives of each element of $J(\theta)$, but I'm not sure what the order of that derivative matrix should be. I see that I basically need to divide this function up into two parts and use a the product rule, but cannot figure out what the derivative of each part should look like.

Any help would be greatly appreciated. Thank you.

2

There are 2 best solutions below

0
On

I was having similar issues with an equation as the second derivative evaluation was not possible.

Numerical Optimization by J Nocedal & S Wright

And The Levenberg-Marquardt Algorithm by Ananth Ranganathan

suggest that the Hessian and Gradient can be approximated rather than directly evaluated. The Hessian can be approximated as the transpose of the Jacobian multiplied by the Jacobian itself. The Gradient can be approximated by the transpose of the Jacobian multiplied by the Residuals.

I hope this is some use to you or points you in a more helpful direction.

5
On

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$For ease of typing, use $x$ instead of $\theta$ for the independent variable. Let's also name variables using calligraphic letters for tensors, uppercase letters for matrices, lowercase letters for vectors, and greeks letters for scalars. To that end define $$\eqalign{ g &= g(x) \\ J &= \p{g}{x} \\ A &= ZWZ^T = A^T \\ \mu &= g^TZWZ^Tg \qquad&\big({\rm objective\,function}\big) \\ &= A:gg^T \qquad&\big({\rm reformulated}\big) \\ }$$ Calculate the differential and the gradient of the objective function. $$\eqalign{ d\mu &= A:\big(dg\,g^T + g\,dg^T\big) \\ &= \big(A+A^T\big):dg\,g^T \\ &= 2Ag:J\,dx \\ &= 2J^TAg:dx \\ \p{\mu}{x} &= 2J^TAg \;\doteq\; p \qquad\big({\rm gradient}\big) \\ }$$ The Hessian is just the gradient of the gradient, so $$\eqalign{ dp &= 2J^TA\,dg + 2\,dJ^TAg \\ &= 2J^TAJ\,dx + 2\big(g^TA^T{\cal K}\,dx\big) \\ &= 2\left(J^TAJ + g^TA{\cal K}\right)dx \\ \p{p}{x} &= 2\left(J^TAJ + g^TA{\cal K}\right) \;\doteq\; H\qquad\big({\rm Hessian}\big) \\ }$$ where ${\cal K}$ is the gradient of $J,\;$ a third-order tensor with components $$\eqalign{ {\cal K}_{ijk} &= \frac{\partial J_{ij}}{\partial x_k} \\ }$$ I can't say anything more about ${\cal K}$, since you didn't tell us anything about the function $g(x)$.


In some of the steps above, a colon is used as a convenient product notation for the trace, i.e. $$\eqalign{ A:B &= {\rm Tr}(A^TB) = B:A \\ }$$