I'm bit confused between Gradient descent and convex optimization using Lagrange Multipliers. I know that we use Lagrange multipliers when we have an optimization problem with one or more constraints.
From the answer of this question, it seems that we can also use gradient descent for constrained optimization.
So what is the difference between those two approaches? Mathematically I know how both of the approaches work but I don't understand when and why one is preferred over another? For example, for optimization of SVM (Support Vector Machine) problem, we use Lagrange multipliers instead of gradient descent.
I've found one similar question here. But the answer is not much clear. Any intuitive explanation/example will help. Thanks.
I know this is a bit late, but maybe it helps someone.
The method of Lagrange Multipliers gives you the analytic theory. With this method you can solve problems by hand (if they are simple enough), and reach the exact symbolic solution. It establishes a set of conditions that can be applied to any problem sistematically.
The Gradient Descent method is a numeric approach. You plug in an initial guess and iterate until the solution is close enough to a local minimum of a penalty function. There are many such penalty functions, and different penalty functions treat constraints differently, but they all "mimic" the Lagrangian function in some way.
Note: A good numerical solver will give you the Lagrange multipliers as well as the primal variables, such that you can check for yourself that the solution satisfies the theoretical optimality conditions (to some tolerance).