Newton's Method in Deep Learning (Goodfellow et. al)

352 Views Asked by Bumbble Comm At 30 Mar 2026 - 7:11

In Goodfellow et. al's book Deep Learning, they cover Newton's method.

Newton's method is an optimization scheme based on using a second-order Taylor series expansion to approximate $J(\theta)$ near some point $\theta_0$, ignoring derivatives of higher order: $$ J(\theta) \approx J(\theta_0) + (\theta - \theta_0)^{T} \nabla_{\theta}J(\theta_0) + \frac{1}{2}(\theta - \theta_0)^{T} H(\theta - \theta_0) $$ If we then solve for the critical point of this function, we obtain the Newton parameter update rule: $$\theta^* = \theta_0 - H^{-1}\nabla_{\theta}J(\theta_0)$$ Note that $H$ is the Hessian Matrix of $J$ with respect to $\theta$.

I have two questions,

If applied iteratively would the update rule essentially be unchanged if modified to $$\theta_{k+1} = \theta_{k} - H^{-1}\nabla_{\theta}J(\theta_k)$$
When going over the training algorithm associated with Newton's method I noticed that they seemed to ignore $\theta_{0}$ even though including it as a required parameter to the algorithm. I am wondering if this was intentional or accidental and if it was accidental at which part in the algorithm would that parameter be used?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 14 Dec 2019 - 5:18 BEST ANSWER

$\theta^* = \theta_0 - H^{-1}\nabla_{\theta}J(\theta_0)$ is the minimizer of the equation for $J(\theta)$ (take derivative w.r.t. theta and set equal to zero). When used as an algorithm, the update rule is indeed the iterative version you give.
Yes they should say "initialize: $\theta \leftarrow \theta_0$" or something to that effect.

Newton's Method in Deep Learning (Goodfellow et. al)

There are 1 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions