Newton's Method in Deep Learning (Goodfellow et. al)

352 Views Asked by At

In Goodfellow et. al's book Deep Learning, they cover Newton's method.

Newton's method is an optimization scheme based on using a second-order Taylor series expansion to approximate $J(\theta)$ near some point $\theta_0$, ignoring derivatives of higher order: $$ J(\theta) \approx J(\theta_0) + (\theta - \theta_0)^{T} \nabla_{\theta}J(\theta_0) + \frac{1}{2}(\theta - \theta_0)^{T} H(\theta - \theta_0) $$ If we then solve for the critical point of this function, we obtain the Newton parameter update rule: $$\theta^* = \theta_0 - H^{-1}\nabla_{\theta}J(\theta_0)$$ Note that $H$ is the Hessian Matrix of $J$ with respect to $\theta$.

I have two questions,

  1. If applied iteratively would the update rule essentially be unchanged if modified to $$\theta_{k+1} = \theta_{k} - H^{-1}\nabla_{\theta}J(\theta_k)$$

  2. When going over the training algorithm associated with Newton's method I noticed that they seemed to ignore $\theta_{0}$ even though including it as a required parameter to the algorithm. Image of training algorithm associated with Netwon's method from Deep Learning I am wondering if this was intentional or accidental and if it was accidental at which part in the algorithm would that parameter be used?

1

There are 1 best solutions below

0
On BEST ANSWER
  1. $\theta^* = \theta_0 - H^{-1}\nabla_{\theta}J(\theta_0)$ is the minimizer of the equation for $J(\theta)$ (take derivative w.r.t. theta and set equal to zero). When used as an algorithm, the update rule is indeed the iterative version you give.
  2. Yes they should say "initialize: $\theta \leftarrow \theta_0$" or something to that effect.