Why do we linearize optimization problems?

322 Views Asked by At

I am currently doing research on the calibration of the robots' geometry, which is a standard and well-studied topic. In fact, it can be formulated as a nonlinear non-convex optimization problem:

Imagine the robot position is denoted by $X(\theta)$, where $\theta$ is the geometry parameter that is erroneous. $X(\theta)$ is a nonlinear function. The objective function in this case is:

$$ \min_\theta \sum_{k = 1}^N \| X_k-X_k(\theta) \|_2 = \min_\theta \sum_{k = 1}^N \| \Delta r_k \|_2, $$ with $N$ being the number of observations and $X_k$ the ground truth coming from accurate sensors. However, in the literature, almost everyone linearizes this problem via Taylor series without justifications. The linearized version is given by

$$ \min_{\Delta \theta} \sum_{k = 1}^N \frac{1}{2} \| \frac{\partial X_k(\theta)}{\partial \theta} \Delta \theta - \Delta r_k\|_2, $$ with $\theta_{true} = \theta_{nominal} + \Delta \theta$.

So, my questions are somewhat on the philosophical side.

  1. Why do we linearize nonlinear optimization problems in the first place? Is it to make the problem convex?
  2. Don't we sacrifice accuracy by linearizing the problem via the Taylor series?
  3. With the current advanced methods and powerful processors/solvers, do we still need to linearize these problems?

Thank you in advance!

P.S. This problem is typically solved offline, to calibrate the robots every 1 or 2 years in the industry. Also, the number of optimization parameters rarely exceeds 28 (4 number of the DH parameters * 7 degrees of freedom). So, time and computational power are of no concern for this particular problem.

2

There are 2 best solutions below

0
On

We can’t solve high degree equation exactly. And even if we could, it would be very slow. The first order (linear) approximation steps iterate the linear equations till the precision is good enough. This can be done much quicker that the higher degree equations, that have their own problems: they are slow, not always very stable and more difficult to implement. The first order is always the best way to start understand something, either in numerical calculations (where it is quicker, easier and more efficient) or in basic modelling of problems (where it is often necessary to get any results).

4
On

I will try to cover your three points by talking a little bit about optimization in general.


Large non-convex optimization problems have to be solved iteratively. When the functions are differentiable, the most common strategy is to use information from the gradient to update our variables in each iteration (order-$1$ methods). But we do have order-$0$ methods (in which we use only information about the function itself); and order-$2$ methods (in which the information from the second derivatives are also included).


The most common approach to non-convex optimization problems is to solve them through Sequential Convex Programming, or Sequential Linear Programming (which is a specific case of Sequential Convex Programming). These algorithms are quite efficient, because we have very good convex optimizers available in literature and in computational packages.


The main idea is of using a less accurate approximations is to speed up each iteration, and to achieve a sequence of ever improving candidates for the solution. In your case, it seems that you are not solving the original non-convex problem at all, you are solving only the linearized subproblem right? This can be a valid solution if you can guarantee that all values of $\Delta \theta$ are sufficiently small (you can check this after solving the linearized sub-problem).


In some cases, we may have a non-linear problem that is convex, but we may still want to linearize it because we can then use the linearity properties somehow, or because we could then use a linear programming solver.


There is no rule of thumb, these decisions are highly problem-dependent. All the following questions are relevant to define how to approach the optimization problem: How much time do we have to solve it? How many variables do we have? How expensive is to compute the original function? How expensive is to compute its gradient? How expensive is to compute or to estimate its second derivatives? Small variations can be assumed? What are my computational resources? Will we solve it analytically, with a CPU, with a GPU? An approximated result is enough for my application? How close my approximation is from the real solution?


  1. We linearize the problem when the gains from the linearization (very nice and useful properties, very quick to solve) compensate the accuracy that we lose. Even linearizing the problem, exact solutions can be obtained through iterative methods.

  2. If we solve a single linearized subproblem, the solution will be approximated, it is up to the human designer to evaluate if this is acceptable for the considered application.

  3. We don't have to use such approaches, but they can be used since they work very well. There are many methods, some perform linearizations, some generate non-linear convex subproblems, some are simple gradient descents, some are order-$0$ heuristic algorithms... But, in my opinion, linearization approaches will be never be put out of use since, whenever they can be used, they probably are the most efficient ones.