I'm a programmer, not a mathematician, but I've got a real-world problem I'm trying to solve that's out of my league, and my Google skills so far have failed me.
I have an analog waveform that's been sampled, part of which contains a sine wave, as shown below:
The sample rate of the data and the frequency of the sine wave are both constant, but unrelated. I need to take this sampled waveform, and "upsample" it, IE, generate more samples between the existing sample points. To do this, I've been using Cubic Spline Interpolation. I've taken an existing C++ library from the following location: http://kluge.in-chemnitz.de/opensource/spline/. The formulas are given on the page. I pass in the existing sampled data as control points, with "X" set to the sample number, and "Y" set to the sample value. Given an input value of X, this library generates me the corresponding "Y" value, so passing in an X of 2.5 would generate a value on the generated spline halfway between samples 2 and 3. All of this is working great so far, and it yields a waveform like the one shown above, but I now need to do one extra thing that I'm a bit stuck on.
What I need to be able to do now is "synchronize" with the the sine wave in the sample data. To do that, I want to detect the point at which the sine wave crosses a given threshold, as shown below:
In other words, given a starting value of X, I want to calculate the next X value at which Y equals a given value. Currently I "brute force" an approximate solution, by incrementing by a given step size between two X values until I cross the threshold point. There are two problems with this:
- Accuracy. The solution is only approximate, and I need a highly accurate result. This leads to a very small step size, while still having a margin of error.
- Performance. I have literally billions of these operations to perform, and I want something that can evaluate in a reasonable period of time. Right now I have to trade off accuracy to improve performance.
Instead of using an imperfect brute force search like I currently do, I was wondering if there's a mathematical solution for this problem. I basically want a function that given an X start position and a target Y sample value, will return the next X where Y is equal to the target Y value. I believe this should be possible to achieve, but I lack the math skills to derive the formula myself. Can anyone tell me if such a function is theoretically possible to write, and if so, what a possible implementation would be?
Here are some potentially useful constraints on my input data, which any acceptable solution can assume:
- The X coordinates of control points are linearly increasing values (IE, all evenly spaced and never "doubling back")
- I know the upper and lower bounds for the X and Y coordinates of the control points when generating the spline
Please let me know if you can see a solution to this problem.
I have no idea what the best way is for this problem so I hope someone comes along and gives you some magic whereupon I will happily delete this.
I have thought a little bit about the advantages a bisection search has but I have convinced myself, perhaps erroneously, that it is not going to outperform the cubic formula due to some tricky cases where the threshold is intersecting the same spline piece twice. Since this is a pretty plausible case I feel like the cubic formula will outperform a bisection search. If this is not true, a bisection search is going to be easier to write and probably cheaper on average than the cubic formula. Newton-Raphson has the best convergence but I don't think it's really great in this case since you already lack a good estimate. However an initial bisection search followed by Newton-Raphson would be dynamite for convergence, but I think in the case of a cubic you're going to be performing very close to the same number of operations, so again I'm guessing the cubic formula will win. The cubic formula requires a lot of multiplications and some complex arithmetic so it's really not clear to me which is best, and from a numerical standpoint I definitely have no idea. I am posting this mostly because no one else has responded and maybe a little answer is better than nothing.
But, I'm not 100% clear if you have to find all the intersecting points, or just one intersecting point. If you only need to find one intersecting point for purposes of synchronizing, please use bisection and Newton's method. It's going to win.
If you want to give a shot at the cubic formula, you'll want to get your hands on the spline coefficients by modifying the library to make the coefficient vectors public and in the meantime get a solid complex number library if you don't already have it. You'll be taking the cube root of a negative number.
If your spline has coefficients $ax^3 + bx^2 + cx + D$ and your threshold is at $y = Y$ then you want to find the zeros of $ax^3 + bx^2 + cx + d$ where $d = D-Y$.
Calculate the discriminant $\Delta = 18abcd - 4b^3d + b^2c^2 - 4ac^3 -27a^2d^2$. You want this to know whether your cubic has one or three real roots. I guess it is theoretically possible you could get some floating point magic to have a double root ($\Delta = 0$).
Then use the form $x_k = -\frac{1}{3a}(b + \zeta^kC + \frac{\Delta_0}{\zeta^k C})$ for $k = 0,~ 1,~ 2$. Note wiki's comment about simplifying the calculation of $C$ since you calculated the discriminant.
So your formula gives you three roots, and your discriminant says you have
A double root.
a. The two roots which are very close in magnitude are in your range. This means your threshold is right on the tip of a peak or trough. I think there's absolutely no harm in ignoring this case.
b. The two roots which are very close in magnitude, only one is in your range. You can probably ignore this case as well.
c. The third root not close in magnitude is in your range. Well, you've got it, discard the imaginary part as floating point error.
Just in case this wasn't enough for you, I have to note that the library stores its coefficients in the form $a(x-x_i)^3 + b(x-x_i)^2 + c(x-x_i) + y_i$. So to know whether the root is "in range" you have to translate it by the window you're currently in.