Why central difference blows up Hessian when step-size is small?

294 Views Asked by At

I'm trying to use automatic differentiation to obtain the gradient, Hessian and compare them with the analytic ones. However when I set step-size very small, the Hessian blows up and the gradient becomes less precise. Why the step-size not the smaller the better? If so, what should be a reasonable step-size for such approximation?

# log_likelihood

mu = 54.2
x = c(37.7,52.5,49,51,49.9,49.8,50.4,49.8,49.1,49.2)
sample_size = length(x)

log_likelihood = function (mu) {
  log_likelihood_value = 0
  for (i in 1:sample_size) {
    log_likelihood_value = log_likelihood_value + log(pi*(1+(x[i]-mu)^2))
  }
  return(-log_likelihood_value)
}
cat('log_likelihood=',log_likelihood(mu))



# Newton-Raphson

gradient = function (mu) {
  gradient_value = 0
  for (k in 1:sample_size) {
    gradient_value = gradient_value + 2*(x[k]-mu)/(1+(x[k]-mu)^2)
  }
  return(gradient_value)
}

hessian = function (mu) {
  hessian_value = 0
  for (k in 1:sample_size) {
    hessian_value = hessian_value + (2*(1+(x[k]-mu)^2)-4*(x[k]-mu)^2)/(1+(x[k]-mu)^2)^2
  }
  return(-hessian_value)
}

mu = median(x)
N = 5
eps = 0.0000000000001
for (j in 1:N) {
  cat('mu=',mu,'\n')
  cat('gard_approx=',(log_likelihood(mu+eps)-log_likelihood(mu-eps))/(2*eps),'\n')
  cat('grad=',gradient(mu),'\n')
  cat('hessian_approx=',(log_likelihood(mu+eps)-2*log_likelihood(mu)+log_likelihood(mu-eps))/eps^2,'\n')
  cat('hessian=',hessian(mu),'\n')
  mu = mu - gradient(mu)/hessian(mu)
}

and I obtain result

mu= 49.8 
gard_approx= -0.2309264 
grad= -0.2463615 
hessian_approx= 1.065814e+12 
hessian= -7.707962 
mu= 49.76804 
gard_approx= -0.01776357 
grad= 0.001049172 
hessian_approx= 355271367880 
hessian= -7.770681 
mu= 49.76817 
gard_approx= -0.01776357 
grad= 1.540493e-08 
hessian_approx= 355271367880 
hessian= -7.770453 
mu= 49.76817 
gard_approx= 0.03552714 
grad= 1.24345e-14 
hessian_approx= 710542735760 
hessian= -7.770453 
mu= 49.76817 
gard_approx= 0.03552714 
grad= 1.24345e-14 
hessian_approx= 710542735760 
hessian= -7.770453 
1

There are 1 best solutions below

0
On

The evaluation of a function via floating point incurs usually a relative error that is a small random multiple of the machine constant (which is usually denoted as $\mu$, but let's use $\delta$ here). The error when using divided differences with step size $\varepsilon$ is then bounded by an expression of the form $$ \frac{4|f(x)|δ}{ε^2}+\frac{|f^{(4)}(x)|ε^2}{12} $$ As you can see, this is not only large for large $ε$, but also for very small $ε$. It is minimal around the point where both terms in the sum are equal, and assuming that function value and derivative are of equal magnitude, this minimal error bound can be found around $ε=\sqrt[4]δ$ which for 64bit double floats gives $δ=2\cdot 10^{-16}$ and thus $ε=10^{-4}$.

It might seem counter-intuitive, but including the floating point errors one finds that the higher the error order of the formula, the larger the $ε$ with the optimal error bound.