How to fit a power function to data with noise

226 Views Asked by At

I have multiple data-sets from a Fourier series of a function $f(t,x)$ (the data-sets where obtained by varying $x$), so $A_n=\frac{2}{T}\int_0^T{f(t,x)\sin{\frac{2\pi nt}{T}}dt}$, which seems to follow an equation of the form $$ A=an^b $$ I know that $b$ will be negative, since $A$ approximately equal to $0$ for large $n$. However $A$ also has noise, which means that for certain data-sets $A$ sometimes also has a few negative values.

This means that I will not be able to use a linear fit using $$ \log(A)=\log(a)+b\log(n) $$

This might be a related question. That has an answer which does avoid this, but secant method or Newton's method can still take some time and I wonder if there might be a faster method?

Maybe related to this question is how would I determine the error ($\sigma_A$) of $A_n$?

Edit:
I was trying to find Fourier series of the true anomaly of closed Kepler orbits (so $0\leq e<1$) as a function of eccentricity $e$, since there is no explicit solution for the position as a function of time. But there is one the other way around $$ t(\theta)=\sqrt{\frac{a^3}{\mu}}\left(2\tan^{-1}\left(\frac{\sqrt{1-e^2}\tan{\frac{\theta}{2}}}{1+e}\right)-\frac{e\sqrt{1-e^2}\sin{\theta}}{1+e\cos{\theta}}\right) $$ The function from which I would like to the get the Fourier series is defined as $$ f(e,t)=\theta(t)-\frac{2\pi t}{T} $$ It is possible to find the time derivative of the true anomaly, $\frac{\delta}{\delta t}\theta$ as a function of $\theta$ itself $$ \dot{\theta}(\theta)=\sqrt{\frac{\mu}{a^3\left(1-e^2\right)^3}}\left(1+e\cos{\theta}\right)^2 $$ This allowed me to rewrite the integral to another integral over $\theta$, since $dt=\frac{d\theta}{\dot{\theta}}$. So $$ A_n=\frac{2}{T}\int_{-\pi}^{\pi}{\frac{\left(\theta-\frac{2\pi t(\theta)}{T}\right)\sin\left(\frac{2\pi nt(\theta)}{T}\right)}{\dot{\theta}(\theta)}d\theta} $$ And since $T=2\pi\sqrt{\frac{a^3}{\mu}}$, this can be further simplified to only a function of $n$ and $e$ since $$ \frac{2\pi t(\theta)}{T}=2\tan^{-1}\left(\frac{\sqrt{1-e^2}\tan{\frac{\theta}{2}}}{1+e}\right)-\frac{e\sqrt{1-e^2}\sin{\theta}}{1+e\cos{\theta}} $$ $$ \frac{2}{T}\frac{1}{\dot{\theta}(\theta)}=\frac{\sqrt{\left(1-e^2\right)^3}}{\pi\left(1+e\cos{\theta}\right)^2} $$ However according to MATLAB, this integral did not have an explicit solution, so I still had to solve it numerically. But fortunately MATLAB does has functionality to solve this integral numerically within small error bounds, which is better and faster than my initial Riemann integral approach. After this I did decided to implement an iterative method to solve for $b$, since I still got quite a lot of negative values.

However my actual goal was actually finding expressions for $a$ and $b$ as a function of $e$. And after some trial and error I got quite good test functions which comply with the data: enter image description here

$$ a(e)=\frac{1.887 e^{0.9591}}{1-0.04663 e^{64.38}} $$ $$ b(e)=\frac{e^4 - 2.61 e^3 + 0.7813 e^2 + 0.8935 e + 2.014 \times 10^{-3}}{1.152 e^3 - 1.08 e^2 - 0.139 e - 6.703 \times 10^{-5}} $$ But when using this to approximate $\theta(t)$ it often seems quite off. So someone know what would be better expressions for $a$ and $b$?

2

There are 2 best solutions below

0
On

The traditional way is to take the log of each side. The equation becomes $\log A=\log a + b \log n$. Your data changes from $(A_i,n_i)$ to $(\log A_i, \log n_i)$ and you can do a linear fit in $a,b$. There are issues with the errors being transformed, which will change the best fit parameters. You are correct that if the measurement errors make some $A_i$ negative you have to figure out what to do about that.

0
On

As you know, the key problem in nonlinear regression is the starting point (initial values of the model parameters). When the problem can be linearized (such via a logarithmic transform), the problem is simpler since the solution of the linearized model wil give you good (or at least reasonable) starting values for the nonlinear model.

However, there are situations such as the one you describe which can be problematic. For sure, in your case, you could exclude in a first step all the data points corresponding to negative values of A(i) and proceed as Ross Millikan suggests in the above answer.

But let me suppose the following general situation : you have to fit a model which is nonlinear because of one single parameter ("b" in your case). If this parameter is given a fixed value, the model is linear and the problem is easy to solve. So, what I suggest you to do (I made that hundreds of time in my life) is to fix "b" at a given value and solve the problem; the result is a sum of squares. Now, change the value of "b" and repeat. This means that you will be able to plot the sum of squares of the residuals as a function of "b" and then to locate the minimum. Since you also know the values of the other parameters for the best value of "b", now you can start the nonlinear regression.

If you had two parameters (say "b" and "c") making the model nonlinear, I should have suggested to build a grid in order to look (contour plot for example) at the sum of squares as a function of "b" and "c".

I hope this is clear. If not, please post and I shall continue.

Happy New Year