Optimal Brain Surgeon via Lagrange multipliers

346 Views Asked by At

The Optimal Brain Surgeon is a way to prune the trained neural network. Our goal is to set one of the weights to zero (which we call $w_q$) to minimize the increase in error.
Formulating the problem we have the following optimization problem.

$$\min _q \min _{\delta w} \left( \frac{1}{2} \delta w^{T}H\delta w \quad\text{s.t.}\quad e_q^{T}\delta w+w_q=0 \right)$$

where $H$ is the Hessian matrix (containing all second order derivatives) and $e_q$ is the unit vector in weight space corresponding to (scalar) weight $w_q$.

We have the solution from the paper [0]:

$$\delta w = -\frac{w_q}{[H^{-1}]_{qq}}H^{-1}e_q$$

I tried to solve it using Lagrange multipliers, but I couldn't derive the denominator of $[H^{-1}]_{qq}$. I started with

$$L = \frac{1}{2} \delta w^{T}H\delta w + \lambda (e_q^{T}\delta w+w_q)$$

and taking the derivatives with respect to $\delta w$ and $\lambda$ we have

$$H\delta w+\lambda e_q^{T}=0$$

$$e_q^{T}\delta w+w_q=0$$

right? I think we need to take the derivates with respect to $q$, right? But I do not know how to do this.


[0] Babak Hassibi, David G. Stork, Second order derivatives for network pruning: Optimal Brain Surgeon", Advances in Neural Information Processing Systems (NIPS), 1992.

1

There are 1 best solutions below

4
On BEST ANSWER

First of all, a massive improvement from your first draft!

Second I believe the notation $[H^{-1}]_{qq}$ gets you confused

$[H^{-1}]_{qq} = e_{q}^{T}\cdot H^{-1}\cdot e_{q}^{T}$

Now the solution need to be straightforward:

  1. $H\delta w+\lambda e_q^{T}=0 \rightarrow \delta w = -\lambda\cdot H^{-1}\cdot e_q^{T} $

  2. $e_q^{T}\delta w+w_q=0 \rightarrow \lambda=\frac{w_q}{[H^{-1}\,\,]_{qq}}$

substituting (2) into (1) will give you the result (up to transpose on the unit vector $e_{q}$.