An iterative method for minimizing squared errors between nonlinear functions

44 Views Asked by At

Suppose that $f$ and $g$ are two vector-valued functions, where $f$ has a possibly highly complex nonlinear form and $g$ is much simpler (e.g., can even be linear). Consider the nonlinear least squares problem $$ \min_x \quad \|f(x) - g(x)\|^2. \qquad (P)$$ Let's say that directly minimizing the target function with some conventional numerical methods (e.g., Gauss-Newton method) is very computationally demanding while minimizing $\|f(x_0) - g(x)\|^2$ for a given $x_0$ is quite straightforward. A hypothetical solution in this case is described as follows:

  1. Initialize $x_0$

  2. Solve $\hat x = {\arg \min}_x \|f(x_0) - g(x)\|^2$

  3. Let $x_0 \gets \hat x$

  4. Repeat 2. and 3.

Is the $\hat x$ sequence given by such an iterative method guaranteed to converge (can be just a local minimum)? Can it be guaranteed to converge under some further specifications?

More specifically, let's say that $f$ is a highly nonparametric function such as a neural network that outputs a $p$-dimensional vector, while $g(x) = Ax$ with $A\in \mathbb{R}^{p\times d}$ and $x\in \mathbb{R}^d$.

2

There are 2 best solutions below

0
On BEST ANSWER

Unfortunately, it is not guaranteed to converge. Consider the following simple counterexample: $f(x)=2x$, and $g(x)=x$. If you start by initializing $x_0$ at $x_0=3$, then your method will traverse the sequence of values $3,6,12,24,48,\dots$, and will never converge. The correct global minimum lies at $x=0$, but it will never be found by this method.

Bummer.

I'm not aware of any method to avoid the dealing with the hard-to-optimize function $f$.

0
On

The following is in the spirit of your proposed solution method.

$ \def\o{{\tt1}} \def\l{\lambda} $Introduce a homotopy parameter $\l$ such that $\l=0$ represents an easy problem and $\l=\o$ recovers the original (hard) problem.

Then solve a sequence of sub-problems $$\eqalign{ {\large x}_\l &= \arg\min_x \big\|\,f\big((\o-\l)x_0+\l x\big) - g(x)\,\big\|^2 \\ }$$ starting at $\l=0$ and slowly increasing $\l\to\o,\,$ using the previously calculated solution as the initial guess for the next sub-problem.

There are many different ways to introduce the homotopy parameter (Newton homotopy, Affine homotopy, pseudo-arclength) and many ideas on how to increment the $\l$ parameter have been proposed.