Extrapolation and Splines

842 Views Asked by At

If you have a smooth curve, and at a certain point in time you want to predict the next turning point, and you assume it is a non-periodic, stationary, smooth process, then what would be the best way to do so?

enter image description here

Say the current point in time is the red arrow and you want to predict where the green arrow will be..

1

There are 1 best solutions below

0
On

Here's what I'd try:

As this does in fact involve some sine wave, a Fourier analysis comes to mind. The problem with that is that the "randomness" does not have a high frequency.

The apparent sinusoid shape of the curve just goes "further up" or "down" by $F-A$. The noise is very subtle and doesn't distort the wave entirely.

Of course, over a longer period, that might indeed be some other function. Say for example the given data represents page visits of a website. The obvious sine wave comes from the day/night cycle, the difference between two days ($A$ vs $F$), might be because $F$ is a weekend day. The difference between days would then have some periodicity of a week, which is hard to see if you only look at two days.

For a Fourier analysis to work you want to have a few periods of the entire curve. If you have a lot of data, you could try to find those low frequency components.


some speculation:

Within the given data, the term added to the wave looks like something simple like $ax+b$. If that's actually part of some lower frequency component may or may not be true.

Let's pretend that within the window that your image shows, the curve is indeed of the shape $sin(x)+ax+b$. Given a window of appropriate size (that is, a little bigger than the period of the obvious sine component), you could fit the data points within that window to the assume formula via regression and get some values for $a$ and $b$.

Place the window at the origin. Move the window to the right and find the values for $a$ and $b$ for each window position, $a(x)$ and $b(x)$ so to speak. This should provide some idea about how these two unknown terms behave. Can you identify a pattern in it? Is it something that can easily be extrapolated? If you try some different window sizes and different "error" terms, you should be able to receive something you can extrapolate into the unknown regions of $x$.