Deriving diagonal approximation of Hessian in a neural network

525 Views Asked by At

Consider the equations relating to the diagonal approximation for the hessian matrix for a neural network in "Pattern Recognition and Machine Learning - Christopher Bishop" (on pg. 250 eq. 5.80)

  1. $\frac{\partial^2 E_{n}}{\partial w_{ji}^2} = \frac{\partial^2 E_{n}}{\partial a_{j}^2} z_{i}^2$

  2. $\frac{\partial^2 E_{n}}{\partial a_{j}^2} = ({h}'(a_{j}))^2 \sum_{k} \sum_{k'} w_{k'j}w_{kj}\frac{\partial^2 E_{n}}{\partial a_{k} \partial a_{k'}} + h''(a_{j})\sum_{k}w_{kj}\frac{\partial E_{n}}{\partial a_{k}}$

$a_{j}$ - input to a unit j

$h$ - activation function

$z_i = h(a_i)$ i.e output of a unit i

$w_{kj}$ - edge from j to k

I am interested in how eq. 2 follows from:

$\frac{\partial E_{n}}{\partial a_{j}} = h'(j)\sum_{k} w_{kj}\frac{\partial E_{n}}{\partial a_{k}}$

where k runs through all the units to which unit j directly sends a connection to (eq 5.56, pg 244).

In particular, I am getting a different result for the first term on the RHS of eq. 2, i.e

  1. $({h}'(a_{j}))^2 \sum_{k} \sum_{k'} w_{k'j}w_{kj}\frac{\partial^2 E_{n}}{\partial a_{k} \partial a_{k'}}$

So, eq. 3 should follow from:

  1. $h'(j)\frac{\partial }{\partial a_{j}}\sum_{k} w_{kj}\frac{\partial E_{n}}{\partial a_{k}}$

Following is the way how I tried to derive eq. 3 from eq. 4 using the chain rule of differentiation:

$h'(a_{j})\frac{\partial }{\partial a_{j}}\sum_{k} w_{kj}\frac{\partial E_{n}}{\partial a_{k}}$

$= h'(a_{j})\sum_{k}\frac{\partial a_{k}}{\partial a_{j}} \frac{\partial }{\partial a_{k}}w_{kj}\frac{\partial E_{n}}{\partial a_{k}}$

$=h'(a_{j})\sum_{k}h'(a_{j})w_{kj}\frac{\partial }{\partial a_{k}}w_{kj}\frac{\partial E_{n}}{\partial a_{k}}$

Used:

$(a_k = \sum_jh(a_j)w_{kj}) \wedge (\frac{\partial w_{kj}}{\partial a_j} = 0 )\rightarrow \frac{\partial a_{k}}{\partial a_j} = h'(j)w_{kj}$

So what remains to be shown is:

  1. $\sum_{k}w_{kj}\frac{\partial }{\partial a_{k}}w_{kj}\frac{\partial E_{n}}{\partial a_{k}} = \sum_{k} \sum_{k'} w_{k'j}w_{kj}\frac{\partial^2 E_{n}}{\partial a_{k} \partial a_{k'}}$

which should complete the proof.


QUESTION 1

How to establish eq. 5 ?

One may want to use:

$\frac{\partial a_k} {\partial w_{kj}} = z_j$


QUESTION 2

If

$y = f(x_1, x_2,...)$

then, is it always true that

$\frac{\partial y} {\partial x_k} = 1 / \frac{\partial x_k} {\partial y}$