non-linear fit problem with high Pr

50 Views Asked by At
I would like to find the appropriate model to my data:
x <- c(10, 40, 70, 100, 130, 160, 190, 220)
y <- c(0.190000, 0.857500, 3.845714, 11.194000, 20.208462, 36.257500, 62.077895, 109.726818)
I used symbolic regression to reach the correct analytic formula in fitness and complexity. 
Since a*x**b looked the best.
After I did the the non-linear fit:
nr <- nls(y ~ a * x**b, start = list(a = 0.001, b = 2), trace=TRUE)
Which result in:
Formula: y ~ a * x^b
Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a 1.408e-06  1.132e-06   1.244     0.26    
b 3.366e+00  1.510e-01  22.283 5.34e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.493 on 6 degrees of freedom
Number of iterations to convergence: 35 
Achieved convergence tolerance: 3.87e-06
Probably this is by far the best fit what I can reach:
cor(y,predict(nr))
0.9984649

a <- coefficients(nr)[1]
b <- coefficients(nr)[2]
t <- seq(from = 0, to = 300, length = 50)
plot(x,y)
lines(t, a * t**b, lty = 2, col = "red")

Plot

But the 'Pr' value of 'b' parameter looks high, 
since can I accept this model, or it is problematic? 
Can you suggest a better way for this fitting?
Thanks
2

There are 2 best solutions below

2
On

If you draw the data on log-log scale, you should obtain a straight line with the function $y=ax^b\quad\to\quad$ log(y)=log(a) + b log(x).

On the first figure one can see that is well obtained, except the first point (10, 0.19).

enter image description here

This draw to think that there is a typo in the data.

On the second figure the first point is (10, 0.019) instead of (10, 0.19). The alignment of the points is much better. So, the hypothesis of a typo is strengthened :

enter image description here

If there is no typo, the bad alignment of the points means that the function $y=ax^b$ isn't convenient.

A better result can be obtained with the function : $$y=ax^b+c$$ which requires a regression with three parameters instead of two :

enter image description here

This is drawn with the not corrected data : first point (10, 0.19).

0
On

The first answer considers models of the kind $y\simeq ax^b$ as suggested by pnz.

Meanwhile, pnz observed that models of the kind $y\simeq ae^{bx}$ are better. That is true in some extend.

This leads to a different answer.

$$y=ae^{bx} \quad\to\quad \ln(y)=A+bx \qquad A=\ln(a)$$

So, the relationship should be linear in the $x,\ln(y)$ graph. This is far to be the case in the range of small values of $y$, as appearing on the next figure.

As a consequence, the fitting cannot be accurate on the whole range.

enter image description here

In order to try to correct the discrepancy a constant term $c$, in fact negative, can be introduced in the model : $$y\simeq ae^{bx}+c$$

The fitting is improved in the small range of $y$ as shown on the next figure :

enter image description here