linear solution of curve fitting on multiple linear functions differing by a multiplier

949 Views Asked by At

I recently posted this question here but I thought this could be of interest also in mathematics, given I found a partially related question here

I am facing the following problem. I know nonlinear least squares can provide a solution but I am wondering if a linear way to solve this data fitting problem may exists.

This is my input dataset: I've got three different dataset composed of scattered points

Input dataset

I know a linear equation in the form $$ y(x) = bx^2 + cx $$

can be used to explain any of my dataset. I know how to fit this function to a given dataset (e.g the blue one) using linear least square, so I could fit the above function separately for each dataset, but I am looking for something different.

In the specific case I have an additional constrain: I also know that the three functions describing the three dataset share the $b$ and $c$ parameters, while they differ by a multiplier, something like this:

$$ \begin{cases} y(x) = (bx^2 + cx) \; \text{explains the blue data}\\ y(x) = a'(bx^2 + cx) \; \text{for red data}\\ y(x) = a''(bx^2 + cx) \; \text{for green data} \end{cases} $$

I am looking for a (if it exists) linear way to solve this problem globally. Obtaining $a', a'', b$ and $c$

Maybe I am missing something so also having a different point of view on how to correctly formulate the problem could help.

Also approximate solutions are welcome...

3

There are 3 best solutions below

0
On

I haven't tried it and see how it looks, but probably look at it as a two stage linear least squares problem. I mean the following:

  1. Stage 1: write down the system of equations $y_i=Px_i$, where $y_i \in R^{3\times 1}$ is a 3-tuple observation across the three data sets, $x_i = [x \ \ x^2]^T$ is the input corresponding to $y_i$ and $P = \left(\begin{array}{cc}p_{0,0} & p_{0,1} \\ p_{1,0} & p_{1,1} \\ p_{2,0} & p_{2,1} \end{array}\right)$, and $i=0, 1, 2, \ldots, $ indicates the observation index. So, denoting $K_{yx} = \sum_i y_ix_i^T$ and $K_{xx} = \sum_i x_ix_i^T$, we can solve for $P$ as $P = K_{yx}K_{xx}^{-1}$.

  2. Stage 2: Now we can take $p_{0,0}$ and $p_{0,1}$ as the inputs to a system $y = a'x$, with $p_{1,0}$ and $p_{1,1}$ as the outputs. Solve for $a'$. Repeat this for $a''$. One can probably do this iteratively taking the other rows of $P$ as inputs and solve for either $a'$ or $1/a'$ or $a''$ or $1/a''$ or $a'/a''$ accordingly.

Not sure if this will result in good solutions for $a',a'', b, c$.

1
On

I think it is not possible (but I may be wrong) to solve this directly. But I would do following:

  • Calculate $b,c$ via least squares with the blue dataset only.

Then iterate over those two steps:

  1. Take $b,c$ from previous step and calculate $a',a''$ via least squares.
  2. Take $a',b'$ from previous step and calculate $b,c$ via least squares.

I think this will converge pretty quickly when the datapoints are not scattered too much.

0
On

This is going to be a pretty long-winded answer, as it was a fun problem to play with. The short story is that the problem is not convex, most likely not possible to write as a linear/qp/socp/sdp problem, but local solvers typically find the global optima, and if written in the right form, is solved to global optimality using a global deterministic solver (i.e, guaranteed lower and upper bounds) rather easily. Additionally, since the model only has only two complicating variables, the problem can alternatively be solved by just gridding rather easily.

I am going to play around with the model using the MATLAB Toolbox YALMIP (developed by me). The nonlinear programs are solved using fmincon, and the LPs/QPs are solved using Gurobi.

Create some data resembling your plot

N = 50;
x1 = linspace(0,10,N)';
x2 = linspace(0,7,N)';
x3 = linspace(5,10,N)';
y1 = 2*x1.^2+5*x1 + randn(N,1)*10;
y2 = 2*(2*x2.^2+5*x2) + randn(N,1)*10;
y3 = 4*(2*x3.^2+5*x3) + randn(N,1)*10;
plot(x1,y1,'ob',x2,y2,'or',x3,y3,'og');

As a first try, we simply write down the problem in the most obvious form, and solve it using a local nonlinear such as fmincon (i.e., we essentially do local nonlinear squares)

sdpvar b c a2 a3
err1 = y1 - [x1.^2 x1]*[b;c];
err2 = y2 - [x2.^2 x2]*[a2*b;a2*c];
err3 = y3 - [x3.^2 x3]*[a3*b;a3*c];
objective = err1'*err1+err2'*err2+err3'*err3;
solvesdp([],objective)

For all instances I have tried, with various number of data points, noise levels and parameter values, the solution appears to be global one. Hence, the problem is pretty easy.

However, we only know that the solution is global since we designed the problem. Hence, we switch to the built-in global solver in YALMIP. The global solver, based on branch&bound and bound propagation using linear programming requires bounds on all variables involved in nonlinear expressions, so we introduce some

bounds = [-100 <= [b c] <= 100, 0 <= [a2 a3] <= 10]
ops = sdpsettings('solver','bmibnb');
solvesdp(bounds,objective,ops)

This does not work well. The globally optimal solution is found directly (as it simply solves the same problem as above) but the lower bounds of the objective are stuck at 0 and there is no progress. Hence, no progress in judging optimality.

The culprit appears to be the fact that we are squaring the bilinear residuals, thus creating quartic terms, which perform badly during bound propagation. Hence, we introduce auxiliary variables to obtain bilinear constraints and a quadratic objective.

err1 = sdpvar(N,1);
err2 = sdpvar(N,1);
err3 = sdpvar(N,1);
residuals = [err1 == y1 - [x1.^2 x1]*[b;c];
             err2 == y2 - [x2.^2 x2]*[a2*b;a2*c];
             err3 == y3 - [x3.^2 x3]*[a3*b;a3*c]];
objective = err1'*err1+err2'*err2+err3'*err3;
solvesdp([bounds,residuals],objective,ops)

This performs better in the sense that the lower bounds are better and the problem is solved to global optimality (within 1%) in one node (i.e one call to fmincon and a (large) bunch of LP solves) which takes some seconds. However, the bound propagation performed to improve the lower bound is pretty slow, and the reason is the large number of bilinear equalities. A somewhat better model is obtained by introducing separate parameters for each quadratic model, and then connect these instead with bilinear equalities. With this formulation, we can go back to a model where the data is collected in the quadratic objective instead, leading to a much smaller model. Importantly, the size of this model does not depend on the number of data points as the two models above do.

sdpvar b c a2 a3 b2 c2 b3 c3
err1 = y1 - [x1.^2 x1]*[b;c];
err2 = y2 - [x2.^2 x2]*[b2;c2];
err3 = y3 - [x3.^2 x3]*[b3;c3];
bounds = [-100 <= [b c b2 c2 b3 c3] <= 100, 0 <= [a2 a3] <= 10];
scale = [[b c]*a2 == [b2 c2], [b c]*a3 == [b3 c3]]
objective = err1'*err1+err2'*err2+err3'*err3;
solvesdp([bounds, scale],objective,ops)

This model has slightly weaker lower bounds and requires a couple of nodes to be solved, but each node is solved much faster as the bound propagation is faster since there are very few constraints and bilinear products (on my machine it takes 1s to 3s to prove optimality within 1%)

Finally, sometimes the easiest solution is the best. We only have two complicating parameters, and when those are fixed, the optimal solution is given by a simple back-solve (or unconstrained quadratic program if you wish to put it in an optimization context). Hence, a trivial approach for a global solution is to simply grid the parameter space in a2 and a3.

Here, I will use YALMIP to solve the problem for a large number of values of a2 and a3. A bit overkill to do it like this, but since all the code is ready from above...

First, a parameterized optimization model. This is an object which is ready to be solved for given values of a2 and a3, returning the optimal objective

ops = sdpsettings('solver','gurobi');
P = optimizer([bounds,scale],objective,ops,[a2;a3],objective)

For instance, the optimal objective for a2=2, a3=3 is given by

P{[2;3]}

If we want to compute the value for a list of parameters, we simply input those in a matrix. Hence, I create a grid of 400 values between 0 and 5, compute all the optimal objectives, plot that optimal objective, and pick out the best choice of a2 and a3. This takes a second or so on my machine

M = 20;
a2range = linspace(0,5,M);
a3range = linspace(0,5,M);
agrid = [kron(a2range,ones(1,M));kron(ones(1,M),a3range)];
optimalobjectives = P{agrid};
[best,loc] = min(optimalobjectives);
best_a = agrid(:,loc)
mesh(reshape(agrid(1,:),M,M),reshape(agrid(2,:),M,M),reshape(optimalobjectives,M,M))

The plot will show you that the optimal objective typically is non-convex, but rather nicely behaved.

Optimal objective in a-space

Finally, an alternating coordinates approach where we alternatively solve a problem for fixed $(a_2,a_3)$, and fixed $(b,c)$, is also easily implemented using two optimizer objects, parametrized in the two groups respectively. Global optimality guarantees out the window, but simple and seems to converge to the global optima

P1 = optimizer([bounds,scale],objective,ops,[a2;a3],[b;c]);
P2 = optimizer([bounds,scale],objective,ops,[b;c],[a2;a3]);

bc = P1{[1;1]}
for i = 1:100
    a2a3 = P2{bc}
    bc = P1{a2a3}
end

As I said, fun problem. I probably went a bit overboard here with some overkill...