I am now studying non-convex optimization as a beginner, which mainly uses GD or SGD. But what is confused to me is that, in many papers [1, 2, 3], the authors assume that the object function is L-smooth.
I know the problem of GD method in non-convex optimization is that the saddle points may make the convergence slow. So I am wondering is there any relation between L-smooth and saddle points? Or if not, why do we always assume L-smooth?
Any reference books/papers and any thoughts are appreciated. Thank you!
Reason 1 (philosophcial). Well, suppose we have the following optimisation problem
$$ \min_{\mathbf{x} \in Q} f({\mathbf{x}}) \tag{1} $$
And our purpose is to solve the problem in terms of function residual $|f({\mathbf{x}}^*) - f({\mathbf{x}}_n)| \leq \epsilon$ or argument residual $||{\mathbf{x}}^*-{\mathbf{x}}_n|| < \epsilon$ (it doesn't matter now). In general we are interesting in number $N(\epsilon)$ of iterations or computing of $f, \nabla f, \nabla^2 f, ...$ (to be more precise) to guarantee the convergence.
This is very undefined problem, because you need to investigate the behaviour of each method on every possible input $f(\mathbf{x})$. Moreover, if you have a specific task, you can derive as good approach for its solving as you want, but unfortunately it can be not applicable to other problems at all. So you need to establish some kind of trade-off between generality and decent theoretical result.
To resolve this issue classes of functions are introduced. To illustrate this idea permit me to give you an example from convex optimisation. Find the solution of (1) assuming, that $f(\mathbf{x})$ has the following properties:
For each class it is possible to obtain lower and upper bounds on $N(\epsilon)$. It is very important result, since it gives an idea how far we can go in improving optimisation techniques.
In non-convex case it still makes sense to consider different classes of functions, e.g. $L$-smooth functions.
Reason 2 (practical). $L$-smoothness is utilised to prove, that GD converges to stationary point $\nabla f(\mathbf{x}^*)=0$.
Direct answer to your question.
Saddle points are the consequence of non-convexity. $L$-smoothness is a "smoothness" property. So there is no direct relationship between them (unless maybe degenerate cases).
$L$-smoothness is not always assumed, because there is at least one another important class of functions, namely $M$-Lipschitz continuous functions.