How to model a dataset given by a simulation software to an equation

34 Views Asked by At

I have a dataset which is produced using a software. I will attach the dataset to this post via a dropbox link as it is too long to attach to this post. My intention is to come up with a equation that will draw a line which will go through most points of this dataset for x values presented on the left column. The accuracy of the line is of very high interest to me. I have been using a regression based tool so far for my work. it is linked below.

http://arachnoid.com/polysolve/

However for this dataset, that tool is performing very poorly. Because of this problem, the model which produces other results is not producing accurate results. I am using MATLAB simulink to input this dataset. Basically, I fidn the equation from the above website and paste it to a MATLAB simulink function box. Therefore, for a given x input, the box outputs the proper y value.

I would highly appreciate a better, more accurate solution for my problem if it is possible.

1

There are 1 best solutions below

1
On BEST ANSWER

The set of data you sent me contains a lot of noise; this is obvious just making a scatter plot of them and I do not think that there is any way to do a good job with them using polynomial regression.

I reproduce below the sum of squares as a function of the degree of the polynomial as well as the corresponding adjusted $R^2$. $$\left( \begin{array}{cc} \text{degree} & \text{SSQ} & \text{adj. R}^2\\ 1 & 109.1 &0.494292\\ 2 & 44.4392& 0.793105\\ 3 & 25.4521 &0.880979\\ 4 & 12.554 &0.941033\\ 5 & 10.471 &0.950597\\ 6 & 10.0505 &0.952369\\ 7 & 9.9772 &0.952503\\ 8 & 9.54288&0.954365 \end{array} \right)$$

Visually, it seems that degree $6$ would not bee "too" bad (but, for most parameters, the coefficients are not significant from a statistical point of view). It write $$y=6.16364\times 10^7 x^6-165894. x^5-99058.5 x^4+1975.35 x^3+149.302 x^2-9.00162 x+0.0553619$$

Let us see what you can do with that.