Newton's method - optimal step size

2k Views Asked by At

I am wondering how to prove the following affirmation:

We have : Min $f(x) = \frac{1}{2}x^TDx - c^Tx$

with $f: R^n → R^1$ and $D$ a symetric positive matrix of size $nxn$.

Suppose $d_k = -\nabla f(x_k)$ is a descent direction in $x_k$.

Show that the optimal solution to the problem $\min\limits_{σ_k > 0} f(x_k + σ_kd_k)$ is:
$σ_k = -\frac{∇f(x_k)^Td_k}{d_k^TDd_k}$.

Thanks for your help!
Louis

1

There are 1 best solutions below

2
On BEST ANSWER

As this is a smooth convex ($D$ is symetric positive) problem, to minimize $\sigma_k\mapsto f(x_k+\sigma_kd_k)$ you can search for $\sigma_k$ such that $$ \frac{d}{d\sigma_k}(\sigma_k\mapsto f(x_k+\sigma_kd_k))=0 $$

hint: $$ \frac{d}{d\sigma_k}(\sigma_k\mapsto f(x_k+\sigma_kd_k))=\langle d_k,\nabla f(x_k+\sigma_kd_k)\rangle $$ where $\langle .,.\rangle$ is the usual scalar product and $\nabla f(x_k+\sigma_kd_k)=D(x_k+\sigma_kd_k)-c=\nabla f(x_k)+\sigma_kDd_k$

note:

This method is the gradient descent method with Cauchy step, not the Newton method. The Newton method includes second order correction $$ x_{k+1}=x_k-H_f^{-1}(x_k)\nabla f(x_k) $$ where $H_f$ is the Hessian. With your $f$, $H_f(x_k)=D$ (constant) and $\nabla f(x_k)=Dx_k-c$, thus: $$ x_{k+1}=x_k-H_f^{-1}(x_k)\nabla f(x_k)=x_k-D^{-1}D(x_k-c)=D^{-1}c $$ and the method converge in one iteration to the solution $x^*=D^{-1}c$

In practice, if we do not want to solve $Dx=c$, we still use first order methods but not the steepest descent one. Its convergence is very slow when $D$ is badly-conditioned. The conjugate gradients method is generally a better choice.