some total question about line search methods

81 Views Asked by At

can somebody tell me about these questions:

1-why we use line search? what kind of advantages have rather than others?

2-can introduce a good source about usage of line search in big data, please?

3-why in line search methods we use direct line? why we can't use curves instead of direct line?(I know the line search methods are based on Taylor theorem, but i want to know about this idea)

thanks for answer!

1

There are 1 best solutions below

1
On BEST ANSWER

Solving some subproblem (a local approximation of your original objective $f$) at iteration $k$ gives you a direction $d^{(k)}$. You'd like to move along this direction to update the current iterate $x^{(k)}$.

In the ideal case, you'd like to move along the entire vector: $x^{(k+1)} := x^{(k)} + d^{(k)}$ (taking large productive steps converges faster). However, this is a poor choice if $x^{(k+1)}$ does not improve upon $x^{(k)}$ (for example, it doesn't improve the objective value). In this case, you want to restrict the length of $d$ and move to $x^{(k+1)} := x^{(k)} + \alpha d^{(k)}$ instead, where $\alpha \in (0, 1]$.

The question is now: what value of $\alpha$ should I choose? Well it depends on your problem. If the problem is simple enough, you can compute the best $\alpha$ analytically as the global minimum of $\phi(\alpha) = f(x^{(k)} + \alpha d^{(k)})$. If this is too costly, you can use a backtracking line-search: as long as you're not happy, you cut $\alpha$ in half (following the sequence 1, 0.5, 0.25, 0.125, and so on).

Now, what does "make progress" really mean? Is it when we have strict decrease of $f$, that is $f(x^{(k+1)}) < f(x^{(k)})$? Well no, it's not enough to prove convergence towards a local minimum. We want sufficient decrease, that is at least a certain amount at each iteration. Look up the Armijo and Wolfe conditions.

Line-search methods are great because generating new steps is cheap: you compute an expensive direction once, then you search along this vector. There are other types of strategies: trust-region methods (where you restrict the length $\alpha$ beforehand and recompute the direction every time $\alpha$ changes) and arc-search methods (I'm not familiar with these ones, I just know they exist).