The term "first order method" is often used to categorize a numerical optimization method for say constrained minimization, and similarly for a "second order method".
I wonder what is the exact definition of a first (or second) order method.
Does this term related to the convergence rate of the method or to the fact that one utilize explicitly first (or second) derivatives of the function to be minimized?
More particularly, Newton is considered second order method. It uses the second order derivatives (Hessian) and has quadratic convergence rate. But what if for example the energy is not convex? In this case the Hessian has to be modified to be (symmetric) positive definite. In such a case, the second order derivatives are used but it's not clear what exactly the convergence rate is. It can be linear depending on the actual modification of the Hessian. So is that a second order or first order method?
As far as I know, there's not an exact, precise term. Personally, I refer to first-order methods as those methods that use first derivatives and second-order methods as those that use second-derivative information. I also tend to consider second-order methods those methods that use second-order derivative information, but not the entire second-order derivative. This means that something like a Gauss-Newton approximation to the Hessian from a least squares problem would be considered a second-order method. For example, if we have the problem $$ J(x) = \frac{1}{2}\|g(x)\|^2 $$ where the exact Hessian is: $$ \nabla^2 J(x)\partial x= (g^{\prime\prime}(x)\partial x)^*g(x) + g^\prime(x)^*g^\prime(x)\partial x $$ I'd still consider a Newton method based on the Hessian approximation $$ H = g^\prime(x)^*g^\prime(x)\partial x $$ to be a second-order method. Technically, we only see first-derivatives here, but to obtain this expression, we needed information about how the second-derivative of $J$ is constructed and not just $g$. Also, under the right assumptions, we can get superlinear convergence with this approximation.
All that said, it is not correct to say that we need a Hessian that is positive definite everywhere to obtain quadratic convergence. If we satisfy the second-order sufficient conditions, we know that the Hessian will be positive definite near the optimal solution, so we can obtain quadratic convergence locally. Really, Newton's method is only guaranteed to converge quadratically within a certain radius, unless we have something nice like a self-concordant function, so convergence rate analysis doesn't generally apply when we're far from the solution. A well-designed globalization strategy such as a trust-region or line-search method should guarantee convergence to a stationary point from an arbitrary starting point, but they generally don't say anything about the convergence rate and they don't require the Hessian to be positive definite everywhere.
Really, the game is to rely on globalization to get us close to the optimal solution and then let Newton's method take over and not interfere with these iterates. Since the second-order optimality conditions tell us that near the solution the Hessian will be positive definite, we can then rely on our convergence rate analysis under this assumption.