Hessian matrix computation for multi-layer neural networks (from Duda's book)

192 Views Asked by At

I am reading Duda's book, Section 6.9.1, about second-order methods for multi-layer neural networks.

Considering the error criterion $J(\omega) = \frac{1}{2}\sum_{m=1}^{n}(t_{m}-z_{m})^{2}$, the elements of the Hessian matrix are written: $\frac{\partial^{2}J(\omega)}{\partial\omega_{ji}\partial\omega_{lk}}= \frac{1}{n}(\sum_{m=1}^{n}\frac{\partial J}{\partial\omega_{ji}}\frac{\partial J}{\partial\omega_{lk}}+\sum_{m=1}^{n}(z_{m}-t_{m})\frac{\partial^{2}J}{\partial\omega_{ji}\partial\omega_{lk}})$

However, I fail to understand where the $\frac{1}{n}\partial J$ comes from as when I calculate the Hessian myself I rather come up with $\partial z_{m}$ terms instead: $\sum_{m=1}^{n}\frac{\partial z_{m}}{\partial\omega_{ji}}\frac{\partial z_{m}}{\partial\omega_{lk}}+\sum_{m=1}^{n}(z_{m}-t_{m})\frac{\partial^{2}z_{m}}{\partial\omega_{ji}\partial\omega_{lk}}$, which seems to correspond with Bishop's derivation.

1

There are 1 best solutions below

3
On

I have to do few hypothesis since in my opinion some information is missing. As observed by Julien $t_m$ is a target function, so it is independent on $\omega$. I assume that $z_m=z_m(\omega)$. So when differentiating the error function $J(\omega)$ I have \begin{equation} \sum_{m=1}^{n}-\frac{\partial z_{m}}{\partial\omega_{ji}}\frac{\partial z_{m}}{\partial\omega_{lk}}-\sum_{m=1}^{n}(z_{m}-t_{m})\frac{\partial^{2}z_{m}}{\partial\omega_{ji}\partial\omega_{lk}} \end{equation} Now \begin{equation} \frac{\partial J}{\partial z_{k}}=-\sum_{m=1}^{n}(t_{m}-z_{m}) \frac{\partial z_m}{\partial z_k}=-\sum_{m=1}^{n}(t_{m}-z_{m}) \delta_{km}= z_m-t_m \end{equation} Then the second derivative with respect to $z_m$ is \begin{equation} \frac{\partial^2 J}{\partial z_{m}^2}=\frac{\partial }{\partial z_{k}}(z_{m}-t_{m})=\delta_{km} \end{equation} So \begin{equation} \frac{\partial^2 J}{\partial z_{m}^2}=1 \end{equation} Then by multiplying and dividing by $\frac{\partial^2 J}{\partial z_{m}^2}$ I can write \begin{equation} \sum_{m=1}^{n}-\frac{\partial z_{m}}{\partial\omega_{ji}}\frac{\partial z_{m}}{\partial\omega_{lk}}-\sum_{m=1}^{n}(z_{m}-t_{m})\frac{\partial^{2}z_{m}}{\partial\omega_{ji}\partial\omega_{lk}} \frac{\partial^2 J}{\partial z_{m}^2}=\sum_{m=1}^{n}-\frac{\partial z_{m}}{\partial\omega_{ji}}\frac{\partial z_{m}}{\partial\omega_{lk}}-\sum_{m=1}^{n}(z_{m}-t_{m})\frac{\partial^{2}J}{\partial\omega_{ji}\partial\omega_{lk}} \end{equation} The above expression was obtained by using the fact that \begin{equation} \frac{\partial^{2}z_{m}}{\partial\omega_{ji}\partial\omega_{lk}} \frac{\partial^2 J}{\partial z_{m}^2}=\frac{\partial^{2}z_{m}}{\partial\omega_{ji}\partial\omega_{lk}} 1 = \frac{\partial^{2}J}{\partial\omega_{ji}\partial\omega_{lk}} \end{equation} When integrating with respect to $\omega$, if the integration constant is zero, we can assume that the first derivative of $J$ and that of $z_m$ with respect to $\omega$ are proportional and the proportionality factor is $1$. So \begin{equation} \sum_{m=1}^{n}-\frac{\partial z_{m}}{\partial\omega_{ji}}\frac{\partial z_{m}}{\partial\omega_{lk}}-\sum_{m=1}^{n}(z_{m}-t_{m})\frac{\partial^{2}J}{\partial\omega_{ji}\partial\omega_{lk}} = -(\sum_{m=1}^{n}(\frac{\partial J}{\partial\omega_{ji}}\frac{\partial J}{\partial\omega_{lk}}+(z_{m}-t_{m})\frac{\partial^{2}J}{\partial\omega_{ji}\partial\omega_{lk}})) \end{equation} I suspect the second term can be neglected.