What is truncated Levenberg-Marquardt and how does it work?

46 Views Asked by At

I am currently trying to understand the Levenberg-Marquardt implementation that (Fetzer et al. 2020) used in their solution. (The paper is publicly accessible here). After reading it, I'm stuck at the part where they talk about their Levenberg-Marquardt implementation in (Section 4.2). The reason for bringing up the paper here, is to provide more context for my actual question. This is NOT meant as an "explain-this-paper-to-me" kind of post. Instead, I'll try to rephrase that section, focussing on the Levenberg-Marquardt approach they used:

The basic idea is to have an objective function that consists of a bunch of summed up functions $f_i$ which just return a scalar value each:

$$ \underset{\Theta}{argmin} \sum_i f(\Theta)_i^2 $$

As you can see, $\Theta$ is the vector that contains the parameters that I want to optimize, and it is the input for all the summed up functions. In the paper, this objective function is a bit more complicated and uses naming conventions that are specific to the problem of calibrating cameras. However, in this question, I'll just focus on this simplified version here to make things easier.

With Levenberg-Marquardt, I want to iteratively minimize this objective function by linearizing it at the current position $\Theta_k$ and walking down the slope (staying within a trusted region) to arrive at the next iteration position $\Theta_{k+1}$ until I'm close enough to the optimum. To find out in which direction to step and how far to step, Lagrange multipliers are used together with the linearized version of the objective function to solve a linear system of equations in each step. So all you really need for this optimization process is the objective function and the ability to evaluate it at arbitrary positions to compute the necessary linearization. Since the paper provides all that, I could just implement my own solution right away, but I want to understand what they are doing in their work specifically.

Unfortunately, the only details they talk about in the paper is that they used "Truncated Levenberg-Marquardt" with a "system matrix":

$$ A = \sum_i J_i^T J_i $$

and "inhomogenity":

$$ b = \sum_i J_i^T \cdot \text{some-scalar-value}_i $$

I'm struggling to understand this part for a few reasons:

  1. What exactly is Truncated Levenberg-Marquardt (I couldn't find any good resources online that explain this terminology)
  2. What exactly is a "system matrix" and "inhomogenity" in the context of Levenberg-Marquardt here? I usually use the term "system matrix" when I want to solve a system of equations $A \cdot x = b$ for $x$, but $b$ seems to means something completely different here, and solving for $x$ using this particular $A$ is not a part of Levenberg-Marquardt as far as I know.

Since there is only a small section in the paper that talks about this part, I would assume that this is something that can be described with just these few terms to be understood by someone who knows more about the topic, but I can't wrap my head around this part.

Maybe Truncated Levenberg-Marquardt means that a truncated SVD is used to compute the step in each iteration? Or maybe they refer to the truncated Hessian Matrix: $\nabla^2 f(x) \approx J^T(x) J(x)$? I'm not knowledgeable enough to connect the dots here and would be thankful for any pointers or material that can help me understand this issue a bit better.