Polynomial best fit line for very large values

596 Views Asked by At

not only are the x values large, the difference between them and the y values is huge. My data points:

$22353120,720$

$24448725,671.427053270323$

$26544330,634.312274868634$

$28639935,566.291966792026$

$30735540,488.299713935616$

$32831145,390.448846935$

$34926750,290.41154091049$

$37022355,204.641148591763$

$39117960,134.468462627021$

$41213565,86.405526235728$

$43309170,51.28276608$

$45404775,34.1174965089024$

$47500380,21.4393552344576$

$49595985,16.058562926011$

$51691590,10.32615461376$

$55882800,0.461961425407946$

The only nice pair there is the first one, and it MUST be produced by the equation, whatever it is. I've tried Wolfram Alpha and Excel to plot them and create a regression line, but neither of them can handle the large numbers or something. Wolfram just says it can't do anything with it, and Excel only generates a binomial equation (even when I select a higher order one) that isn't anywhere near correct. Is there any way to do this?

2

There are 2 best solutions below

0
On

If the first point must be on the line exactly, that eliminates one degree of freedom from the standard fits. The large numbers are not a problem-there sometimes is if they are over a small range. I got Excel to do a third order polynomial fit, but it doesn't fit very well. A fifth order fits decently by eye. If you want to extract the coefficients, it would help to scale your first column by dividing by $10^5$ or so. I have done so in the below image. For a "rough and ready approach" I would take this fit and add the correct constant to make the first point fit. One way to force it closer to the first point is just to duplicate the first point a bunch of times. The image has $18$ copies of the first pointenter image description here

0
On

If I correctly understand, you want to fit a polynomial model such as

$$Y = a_0 + a_1 X + a_2 X^2 + a_3 X^3 + \cdots$$

but you want that the first point $(X_0,Y_0)$ be exactly matched. So, you have a parameter which has to be removed. Rewrite your equation as

$$Y - Y_0 = a_1 (X - X_0) + a_2 (X - X_0)^2 + a_3 (X - X_0)^3 +\cdots$$

You see that here, your first point is perfectly matched. So, define as new variables

$Z_i = Y_i - Y_0$
$T_i = X_i - X_0 $

and perform your regression as

$$Z = a_1 T + a_2 T^2 + a_3 T^3 + \cdots$$

But do not forget to exclude the intercept (the option "nointercept" is available in almost any regression tool). If you do not have this capability, let me know.