What does a gradient multiplied by the inverse of the Hessian mean?

1.4k Views Asked by Bumbble Comm At 29 Apr 2026 - 9:07

In the paper VIME: Variational Information Maximizing Exploration, the authors suggest taking a single second-order step to efficiently optimize a variant of evidence lower bound

They say

To optimize Eq. (12) efﬁciently, we only take a single second-order step. This way, the gradient is rescaled according to the curvature of the KL divergence at the origin. As such, we compute $D_{KL} [q(\theta; \phi + \lambda\Delta\phi)\Vert q(\theta; \phi)]$, with the update step $\Delta\phi$ deﬁned as $$ \Delta\phi=H^{-1}(l)\nabla_\phi l(q(\theta;\phi),s_t) \tag{13} $$ in which $H(l)$ is the Hessian of $l(q(\theta;\phi),s_t)$.

I'm quite confused about how they obtain Eq.$(13)$: why would they multiply the gradient by the inverse of the Hessian?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 09 Mar 2019 - 1:47 BEST ANSWER

I'm not familiar with the form of the cost function they are optimizing, but it seems to me that your question might be resolved by getting some intuition on second-order methods. See this wiki entry on a fundamental example and how second derivatives (i.e. Hessians) allow us to do better optimization by using curvature information of the function.

What does a gradient multiplied by the inverse of the Hessian mean?

There are 1 best solutions below

Related Questions in CALCULUS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions