Wolfe Conditions and capturing the minimizer

353 Views Asked by At

Hi guys I am learning about line search algorithms. In general the idea is that we want to solve the minimization problem

$$\min_\alpha f(x_k+ \alpha p_k)$$

Then we use alpha as our step-size. In practice this is expensive to solve so we get a close enough alpha not nessesary the best. One of the most used conditions to ensure we do decrease the fuchtion are the Wolfe conditions

$$f(\mathbf{x}_k+\alpha_k\mathbf{p}_k)\leq f(\mathbf{x}_k)+c_1\alpha_k\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k)$$ and $$-\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k+\alpha_k\mathbf{p}_k) \leq -c_2\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k)$$

My question is are we guaranteed that among the alphas that the Wolfe conditions we have the actual solution to the minimization problem? To me there is not an obvious answer.

1

There are 1 best solutions below

2
On

I know this has been posted more than a year ago, but if someone else is having a similar question:

In general, we want to solve the minimization problem (for $x$, not $\alpha$!) $$\min_\mathbf{x} f(\mathbf{x})$$

and since we are going to iteratively wander around the space searching for the minimum, it would be nice to, at each step $k$, choose reasonable search direction $\mathbf{p_k}$ and step size $\alpha$. Assuming we have a reasonable $\mathbf{p_k}$ we now need to select $\alpha$ such that: $$\min_\alpha f(\mathbf{x_k}+ \alpha \mathbf{p_k})$$

My question is are we guaranteed that among the alphas that [meet] the Wolfe conditions we have the actual solution to the minimization problem?

No, selecting $\alpha$ that meets Wolfe conditions will only ensure that (assuming the angle $\theta_k$ between $\mathbf{p}_k$ and $\nabla f(\mathbf{x_k})$ is greater than zero) gradient with each steps converges to zero, i.e.

$$\nabla f(\mathbf{x_k}) \to 0$$

This is because the first condition, i.e.

$$f(\mathbf{x}_k+\alpha_k\mathbf{p}_k)\leq f(\mathbf{x}_k)+c_1\alpha_k\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k)$$

makes sure that at the new point $\mathbf{x}_k+\alpha_k\mathbf{p}_k$ you decreased $f$ "sufficiently" and the second condition, i.e.

$$-\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k+\alpha_k\mathbf{p}_k) \leq -c_2\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k)$$

or (after negating both sides):

$$\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k+\alpha_k\mathbf{p}_k) \geq c_2\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k)$$

makes sure that the slope at the new point is bigger, or "less negative" (note that direction search $\mathbf{p_k}$ points in the opposite direction than the gradient, i.e. $\mathbf{p_k}^T\nabla f(\mathbf{x_k}) < 0$), and that's exactly what we want to achieve - local minimum of $f$ will be found where the gradient is zero (unless its a saddle point).

TLDR: $\alpha$ that meets Wolfe conditions will not give you the actual solution to the minimization problem but will make sure (together with the assumption about non-zero angle $\theta_k$ between $\mathbf{p}_k$ and $\nabla f(\mathbf{x_k})$) that at each step you'll be closer to the point where the gradient is zero.