The equation:
$$\Delta x = -\mathbf{H}^{-1}\mathbf{g}$$
Is the key part of Quasi-Newtonian methods, where $\Delta x$ is the coordinate shift required to reach a minimum, $\textbf{H}$ is the hessian of some function $f(\mathbf{\vec{x}})$, and $\textbf{g}$ is its gradient.
Given some initial point $\mathbf{\vec{x}}_0$, along with $\textbf{H}$ and $\textbf{g}$, the equation gives you $\Delta x$ which is the coordinate change to $\mathbf{\vec{x}}$ that is required to put $f(\mathbf{\vec{x}})$ at some local extrema. Close to a minimum it will find a minimum point.
However, in the four references I have read on this matter [1-4], I have found no good explanation of why $\Delta x = -\mathbf{H}^{-1}\mathbf{g}$. Even in the paper which establishes this equation[5], I find the derivation quite hard to follow.
How can one derive this equation? Are there any resources which go over its derivation in more detail?
References:
[1] Press, William H.; Teukolsky, Saul A.; Vetterling, William T.; Flannery, Brian P., Numerical recipes. The art of scientific computing., Cambridge: Cambridge University Press (ISBN 978-0-521-88068-8/hbk; 978-0-511-33239-5/ebook). xxi, 1235 p. (2007). ZBL1132.65001.
[2] Gale, Julian D.; Rohl, Andrew L., The general utility lattice program (GULP), Mol. Simul. 29, No. 5, 291-341 (2003). ZBL1047.81583.
[3] Fletcher, R.; Powell, M. J. D., A rapidly convergent descent method for minimization, Comput. J. 6, 163-168 (1963). ZBL0132.11603.
[4] Shanno, D. F., Conditioning of quasi-Newton methods for function minimization, Math. Comput. 24(1970), 647-656 (1971). ZBL0225.65073.
[5] Davidon, William C., Variable metric method for minimization, SIAM J. Optim. 1, No. 1, 1-17 (1991). ZBL0752.90062.
Working off of Ian's answer, it seems the equation is derived thus. You have some function $f(\mathbf{\vec{x}})$, whose gradient is given $\nabla f(\mathbf{\vec{x}})$. Extreme points can be found at $\nabla f(\mathbf{\vec{x}}) = 0$, and so we would like to approximate this function with a Taylor series:
$$\begin{split} \nabla f(\mathbf{\vec{x}} + \Delta {\vec{x}}) & \approx \nabla f(\mathbf{\vec{x}})((\mathbf{\vec{x}} + \Delta {\vec{x}}) - \mathbf{\vec{x}})^0 + \nabla (\nabla f(\mathbf{\vec{x}}))((\mathbf{\vec{x}} + \Delta {\vec{x}}) - \mathbf{\vec{x}})^1 \\[0.5em] & = \nabla f(\mathbf{\vec{x}}) + \nabla (\nabla f(\mathbf{\vec{x}}))\Delta {\vec{x}}\end{split}$$
Where you are evaluating the function at $\mathbf{\vec{x}}$, and the input to the function is now: $\mathbf{\vec{x}} + \Delta {\vec{x}}$.
As we would like the gradient to evaluate to zero with the step $\Delta {\vec{x}}$ we set the LHS to zero and rearrange to get the following:
$$\begin{split} 0 & = \nabla f(\mathbf{\vec{x}}) + \nabla (\nabla f(\mathbf{\vec{x}}))\Delta {\vec{x}} \\[0.5em] -\nabla f(\mathbf{\vec{x}}) & = \nabla (\nabla f(\mathbf{\vec{x}})) \Delta {\vec{x}} \\[0.5em] -\frac{\nabla f(\mathbf{\vec{x}})}{\nabla (\nabla f(\mathbf{\vec{x}}))} & = \Delta {\vec{x}}\end{split}$$
And because $f(\mathbf{\vec{x}})$ is a function of many variables, its gradient of its gradient is the Hessian written $\mathbf{H}$, and so:
$$\begin{split} \frac{1}{\nabla (\nabla f(\mathbf{\vec{x}}))} & = \mathbf{H}^{-1} \\[0.5em] \nabla f(\mathbf{\vec{x}}) & = \mathbf{g} \end{split}$$
Where $\mathbf{g}$ is the gradient. Putting it all together we get:
$$ \Delta \vec{x} = -\mathbf{H}^{-1} \mathbf{g}$$