Why do we need to find the inverse of a Hessian in second order optimization?

3.5k Views Asked by At

From what I understand, taking the inverse of a matrix is used when solving a system of linear equations.

When we perform second order optimization, we take the inverse of a Hessian in the weight update formula.

My question is: why do we need to take the inverse of the Hessian? Can't we just multiply it by the Hessian itself (not the inverse).

1

There are 1 best solutions below

0
On

Write out the second-order taylor series around the starting point $x_0$: $F(x_0+h)\approx F(x_0)+(\nabla F,h)+h^tHess(F)h/2$ where the derivatives are evaluated at $x_0$. We want to choose the step $h$ so that this Taylor expansion is maximized/minimized. Differentiating, we see that we should choose $h$ to satisfy $\nabla F+Hess(F)h=0$, i.e. $h=-Hess(F)^{-1}\nabla F$