I would like to find the appropriate model to my data:
x <- c(10, 40, 70, 100, 130, 160, 190, 220)
y <- c(0.190000, 0.857500, 3.845714, 11.194000, 20.208462, 36.257500, 62.077895, 109.726818)
I used symbolic regression to reach the correct analytic formula in fitness and complexity.
Since a*x**b looked the best.
After I did the the non-linear fit:
nr <- nls(y ~ a * x**b, start = list(a = 0.001, b = 2), trace=TRUE)
Which result in:
Formula: y ~ a * x^b
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 1.408e-06 1.132e-06 1.244 0.26
b 3.366e+00 1.510e-01 22.283 5.34e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.493 on 6 degrees of freedom
Number of iterations to convergence: 35
Achieved convergence tolerance: 3.87e-06
Probably this is by far the best fit what I can reach:
cor(y,predict(nr))
0.9984649
a <- coefficients(nr)[1]
b <- coefficients(nr)[2]
t <- seq(from = 0, to = 300, length = 50)
plot(x,y)
lines(t, a * t**b, lty = 2, col = "red")
Plot
But the 'Pr' value of 'b' parameter looks high,
since can I accept this model, or it is problematic?
Can you suggest a better way for this fitting?
Thanks
If you draw the data on log-log scale, you should obtain a straight line with the function $y=ax^b\quad\to\quad$ log(y)=log(a) + b log(x).
On the first figure one can see that is well obtained, except the first point (10, 0.19).
This draw to think that there is a typo in the data.
On the second figure the first point is (10, 0.019) instead of (10, 0.19). The alignment of the points is much better. So, the hypothesis of a typo is strengthened :
If there is no typo, the bad alignment of the points means that the function $y=ax^b$ isn't convenient.
A better result can be obtained with the function : $$y=ax^b+c$$ which requires a regression with three parameters instead of two :
This is drawn with the not corrected data : first point (10, 0.19).