In Goodfellow et. al's book Deep Learning, they cover Newton's method.
Newton's method is an optimization scheme based on using a second-order Taylor series expansion to approximate $J(\theta)$ near some point $\theta_0$, ignoring derivatives of higher order: $$ J(\theta) \approx J(\theta_0) + (\theta - \theta_0)^{T} \nabla_{\theta}J(\theta_0) + \frac{1}{2}(\theta - \theta_0)^{T} H(\theta - \theta_0) $$ If we then solve for the critical point of this function, we obtain the Newton parameter update rule: $$\theta^* = \theta_0 - H^{-1}\nabla_{\theta}J(\theta_0)$$ Note that $H$ is the Hessian Matrix of $J$ with respect to $\theta$.
I have two questions,
If applied iteratively would the update rule essentially be unchanged if modified to $$\theta_{k+1} = \theta_{k} - H^{-1}\nabla_{\theta}J(\theta_k)$$
When going over the training algorithm associated with Newton's method I noticed that they seemed to ignore $\theta_{0}$ even though including it as a required parameter to the algorithm.
I am wondering if this was intentional or accidental and if it was accidental at which part in the algorithm would that parameter be used?