I am trying to fit a general Sinusoidal curve on a set of data points with following features:
- Data covers only part of the period
- Data points are not equally distanced apart
What would be the best way to approximate the frequency in this specific case? In all the cases I saw, data usually covered several periods, for example, I was trying to use this method:https://www.scribd.com/doc/14674814/Regressions-et-equations-integrales# (page 24) which somehow linearises the problem of finding the frequency by using integrals. However, it does not give me any good results. And neither does ChatGPT lol.
What is the best strategy for approximating frequency in this particular case? I would also like to avoid iterative approaches (I.E. Gradient descent)
Thank you very much for your answers and help!
EDIT: As asked, the data set can be found in thisDesmos Graph.
Personal Edit by Jean Marie : If it can help, here is the data array :
151.73378 -7.66837
151.71739 -7.6667
151.69138 -7.65908
151.62711 -7.6494
151.57689 -7.64485
151.52066 -7.63923
151.46477 -7.63805
151.42001 -7.63621
151.35583 -7.63503
151.30501 -7.63482
151.26636 -7.63555
151.19527 -7.63998
151.14746 -7.64203
151.08684 -7.64768
151.03366 -7.65429
150.99374 -7.65947
150.9238 -7.66734
150.87398 -7.67575
150.81732 -7.68646
150.76454 -7.69704
150.7218 -7.70754
150.66092 -7.72259
150.60757 -7.73582
150.55692 -7.74901
150.50587 -7.76589
150.45662 -7.78243
150.39748 -7.80429
150.34641 -7.82432
150.30864 -7.83916
150.24689 -7.86442
150.19942 -7.8852
150.14586 -7.9092
150.09666 -7.93266
150.05682 -7.95329
149.99934 -7.98415
149.95216 -8.0112
149.903 -8.04024
149.85516 -8.07024
149.81479 -8.09681
149.7623 -8.12986
149.71567 -8.16173
149.68178 -8.18646
149.6265 -8.22563
149.58386 -8.25824
149.53705 -8.29439
149.49318 -8.32941
149.45874 -8.36022
149.40773 -8.40409
149.36754 -8.44096
149.326 -8.48096
149.28476 -8.52128
149.25078 -8.55716
149.207 -8.60217
149.1693 -8.64357
149.13908 -8.67568
149.0923 -8.72745
149.05728 -8.76954
149.02003 -8.81639
148.98415 -8.86092
148.95917 -8.89317
148.91348 -8.95282
148.88315 -8.99699
148.8477 -9.04725
148.81738 -9.09368
148.78664 -9.14241
148.75612 -9.19128
148.72556 -9.2432
148.6987 -9.2938
148.67188 -9.3461
148.64496 -9.3996
148.61932 -9.45188
148.59395 -9.50262
148.57225 -9.54007
148.55563 -9.56308
148.53592 -9.56818
You have $n$ data points $(x_i,y_i)$ and you want to fit the model $$y=a\sin(bx+c)+d$$ which is highly non linear.
Expand the sine and let $$\alpha=a\sin(c) \qquad \text{and} \qquad \beta=a \cos(c)$$ to make $$y=\alpha \cos(bx)+\beta \sin(bx)+d$$
If $b$ were known, the problem would just be a bilinear regression using $t_i=\cos(bx_i)$ and $u_i=\sin(bx_i)$.
So, consider the sum of squares as a function of $b$. Try different values untill you see more or less a minimum. At this point, you have good estimates of $(b,\alpha,\beta,d)$ and you can safely run a nonlinear regression or even a Newton-Raphson procedure for two variables only since $d$ is implicitely known.