$_Linear regression for polynomial fitting

66 Views Asked by At

I am doing some curve fitting. The theoretical curve is hyperbolic and have the form $(x-x_0)(y-y_0)=c$. This is not linear, so the normal linear least square regression is not apply immediately.

However I noticed if I transform it to

$$xy - y_0x - x_0y + x_0y_0 = c$$

and let $z=xy$ then this is a linear equation across variable $x$, $y$ and $z$. So I can then do normal linear regression to find $y$ from $x$ and $z$.

My question is: Are there any risk on doing this? Will I miss some "better" fitting curve in any sense? Otherwise, this seems to be working for any multiple variable polynomial fitting.

1

There are 1 best solutions below

2
On BEST ANSWER

I suppose that you have $n$ data points ($x_i,y_i$) and that you want to fit the model $$y=y_0+\frac c{x-x_0}$$ where the parameters to be adjusted are $c,x_0,y_0$.

As you did, you could, in a first step, rewrite $$xy - y_0x - x_0y + x_0y_0 = c$$ and then run a regression for $$xy=y_0x+x_0y+(c-x_0y_0)$$ that is to say $$z=\alpha x+ \beta y+\gamma$$ which is a multiplinear regression. Fom the so obtained parameters $\alpha, \beta, \gamma$, you get $x_0,y_0,c$ but these are only estimates to be now used in a nonlinear regression since what has been measured are the $y_i$'s and not the $x_iy_i$'s. Starting with thes estimates, I suppose that the convergence would be very fast. But, I insist on the fact that this nonlinear regression work must be done if you do not wnt to introduce a bias in your results.

For illustration purposes, I generated $10$ data point $(i=1$ to $10$), $x_i=i$, $$y_i=12.34 +\frac{23.45}{x_i-5.67}+(-1)^i$$ and applied the procedure. The linear regression leads to $\alpha=12.3391$, $\beta=5.66684$, $\gamma=-45.9608$ that is to say $x_0=5.66684$, $y_0=12.3391$, $c=23.9631$ which are effectively close the the values used for the generation of the data points. Using these values as starting guesses for the nonlinear regression leads to $$y=12.2873 +\frac{23.759}{x-5.66991}$$ As you can observe, the results are not the same and the last ones are consistent with the fact that only the $y_i$'s were measured.

If after the linear regression, we compute the predicted values of $y$, we obtain a sum of squares equal to $9.07$ while, after the nonlinear regression step, this sum of squares is equal to $8.82$. Not very different (because of the small noise) but different.