How do you Derive $\Delta x = -\textbf{H}^{-1}\textbf{g}$ from Quasi-Newtonian Methods?

68 Views Asked by At

The equation:

$$\Delta x = -\mathbf{H}^{-1}\mathbf{g}$$

Is the key part of Quasi-Newtonian methods, where $\Delta x$ is the coordinate shift required to reach a minimum, $\textbf{H}$ is the hessian of some function $f(\mathbf{\vec{x}})$, and $\textbf{g}$ is its gradient.

Given some initial point $\mathbf{\vec{x}}_0$, along with $\textbf{H}$ and $\textbf{g}$, the equation gives you $\Delta x$ which is the coordinate change to $\mathbf{\vec{x}}$ that is required to put $f(\mathbf{\vec{x}})$ at some local extrema. Close to a minimum it will find a minimum point.

However, in the four references I have read on this matter [1-4], I have found no good explanation of why $\Delta x = -\mathbf{H}^{-1}\mathbf{g}$. Even in the paper which establishes this equation[5], I find the derivation quite hard to follow.

How can one derive this equation? Are there any resources which go over its derivation in more detail?

References:

[1] Press, William H.; Teukolsky, Saul A.; Vetterling, William T.; Flannery, Brian P., Numerical recipes. The art of scientific computing., Cambridge: Cambridge University Press (ISBN 978-0-521-88068-8/hbk; 978-0-511-33239-5/ebook). xxi, 1235 p. (2007). ZBL1132.65001.

[2] Gale, Julian D.; Rohl, Andrew L., The general utility lattice program (GULP), Mol. Simul. 29, No. 5, 291-341 (2003). ZBL1047.81583.

[3] Fletcher, R.; Powell, M. J. D., A rapidly convergent descent method for minimization, Comput. J. 6, 163-168 (1963). ZBL0132.11603.

[4] Shanno, D. F., Conditioning of quasi-Newton methods for function minimization, Math. Comput. 24(1970), 647-656 (1971). ZBL0225.65073.

[5] Davidon, William C., Variable metric method for minimization, SIAM J. Optim. 1, No. 1, 1-17 (1991). ZBL0752.90062.

2

There are 2 best solutions below

3
On BEST ANSWER

Working off of Ian's answer, it seems the equation is derived thus. You have some function $f(\mathbf{\vec{x}})$, whose gradient is given $\nabla f(\mathbf{\vec{x}})$. Extreme points can be found at $\nabla f(\mathbf{\vec{x}}) = 0$, and so we would like to approximate this function with a Taylor series:

$$\begin{split} \nabla f(\mathbf{\vec{x}} + \Delta {\vec{x}}) & \approx \nabla f(\mathbf{\vec{x}})((\mathbf{\vec{x}} + \Delta {\vec{x}}) - \mathbf{\vec{x}})^0 + \nabla (\nabla f(\mathbf{\vec{x}}))((\mathbf{\vec{x}} + \Delta {\vec{x}}) - \mathbf{\vec{x}})^1 \\[0.5em] & = \nabla f(\mathbf{\vec{x}}) + \nabla (\nabla f(\mathbf{\vec{x}}))\Delta {\vec{x}}\end{split}$$

Where you are evaluating the function at $\mathbf{\vec{x}}$, and the input to the function is now: $\mathbf{\vec{x}} + \Delta {\vec{x}}$.

As we would like the gradient to evaluate to zero with the step $\Delta {\vec{x}}$ we set the LHS to zero and rearrange to get the following:

$$\begin{split} 0 & = \nabla f(\mathbf{\vec{x}}) + \nabla (\nabla f(\mathbf{\vec{x}}))\Delta {\vec{x}} \\[0.5em] -\nabla f(\mathbf{\vec{x}}) & = \nabla (\nabla f(\mathbf{\vec{x}})) \Delta {\vec{x}} \\[0.5em] -\frac{\nabla f(\mathbf{\vec{x}})}{\nabla (\nabla f(\mathbf{\vec{x}}))} & = \Delta {\vec{x}}\end{split}$$

And because $f(\mathbf{\vec{x}})$ is a function of many variables, its gradient of its gradient is the Hessian written $\mathbf{H}$, and so:

$$\begin{split} \frac{1}{\nabla (\nabla f(\mathbf{\vec{x}}))} & = \mathbf{H}^{-1} \\[0.5em] \nabla f(\mathbf{\vec{x}}) & = \mathbf{g} \end{split}$$

Where $\mathbf{g}$ is the gradient. Putting it all together we get:

$$ \Delta \vec{x} = -\mathbf{H}^{-1} \mathbf{g}$$

8
On

You are trying to solve $\nabla f(x)=0$. You perform Taylor expansion near the current point $x$ and try to set the new gradient equal to $\nabla f(x^*)=0$. So you want $0=\nabla f(x) + H(x) \Delta x$, using Taylor's theorem. Rearranging you get $\Delta x = -H(x)^{-1} (\nabla f(x))$.

What's more confusing is why the Newton search direction is useful with a step size that isn't $1$.