Properties of the Solution of Non Linear Least Squares

764 Views Asked by At

What are the guaranteed properties of the Non Linear Least Squares solution?

On Non Linear Least Squares in Wikipedia they mention that there might be many Local Extreme Points (Which is trivial since the Cost Function isn't guaranteed to be Convex).
Yet on Least Absolute Deviation in Wikipedia they mention on the table:

Properties of the Least Squares Solution and Least Absolute Deviation solution

That there is always one solution to the Least Squares Problem.
Could it be the mean "Linear Least Squares"?

In general, what is guaranteed for the Global Solution of the Non Linear Least Squares?
In cases the Cost Function is Coercive it is guaranteed to exist, is it true always? Is it unique (Namely the cost function value is lower there than any other local minimum)?

To make things simple, for Cyclic Functions (Like if the parameters is the phase of a Sine / Cosine Function) we'll limit our self to one cycle of the values.

1

There are 1 best solutions below

3
On BEST ANSWER

The table is wrong in general based on the reasoning you present. The article is also poorly written and could use a lot of editing. However, least squares fitting does have a unique solution if the parameters being fit enter the function $f$ linearly (even if the function is nonlinear in the data variable), and the data rank is at least as large as the number of parameters.

Specifically, suppose that $f$ can be written as a linear combination of $n$ (possibly nonlinear) functions $g_i$ as follows*, $$f(x) = c_1 g_1(x) + c_2 g_2(x) + \dots + c_n g_n(x),$$ and our goal is to find the coefficients $c_i$ based on $m$ known pairs of data, $(x_i,y_i)_{i=1}^m$ so that $f(x_i) \approx y_i$. Let us define the "data matrix" $X$, data vector $y$, and parameter vector $c$ as: $$X := \begin{bmatrix} g_1(x_1) & g_2(x_1) & \dots & g_n(x_1) \\ g_1(x_2) & g_2(x_2) & \dots & g_n(x_2) \\ \vdots & \vdots & \ddots & \vdots \\ g_1(x_m) & g_2(x_m) & \dots & g_n(x_m) \end{bmatrix} \quad y:=\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} \quad c:=\begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}$$ (each column of $X$ is the same function $g_i$ and each row is the same data point $x_i$). Then the least squares objective function can be written as follows: $$\sum_i (y_i - f(x_i))^2 = ||y - Xc||^2.$$

This is a nonnegative quadratic function in $c$, so it always has at least one minimum, and that minimum is unique if $X$ has full column rank. In fact, the minimum $c^*$ is given by the solution of the following normal equations: $$c = (X^TX)^{-1}X^T y$$


*In general, any function of two variables $f$ that is linear in one and nonlinear in another can be written in this form. specifically, suppose that $f$ is a general function of $c$ and $x$ that is linear in $c$ and nonlinear in $x$. Then expressing $c$ in a given basis $e_i$ and using the linearity of $f$ in $c$, we can convert $f(c,x)$ into the form considered above: \begin{align} f(c,x) &= f(c_1 e_1 + c_2 e_2 + \dots + c_n e_n, x) \\ &= c_1 \underbrace{f(e_1,x)}_{g_1(x)} + c_2 \underbrace{f(e_2,x)}_{g_2(x)} + \dots c_n \underbrace{f(e_n, x)}_{g_n(x)}. \end{align}