I am interested in using natural cubic splines to generate possible replacement values in the quality control of data. I would like to do this as close to real-time as I can. That is, I would like to use only one point (todays value) on the right of the value I wish to predict (yesterdays value) and more points (the past) on the left. My question is: How far in the past should I go? I know that a natural cubic spline does take into account every data point that you feed it. I just wonder how sensitive it is to say, 40 points versus, maybe 15 or 20 when the point I am evaluating at is so far to the "right".
If anybody has knowledge of this or could at least point me to further reading, I would appreciate it. Thanks.
To understand the sensitivity to far-away data points, you should look at the graphs of the cardinal basis functions for the space of natural cubic splines. See the second set of pictures in this question.
As you can see, these functions decay to almost zero quite rapidly, (though they are always non-zero, except at knots). So, in your kind of application, I would say that the difference between using 15 or 20 points would be negligible. In fact, if it were me, I'd probably choose 6 to 8 points.
Also, you might consider scrapping the spline idea altogether. I'd suggest just interpolating a few nearby points using a low-degree polynomial. You can write down a closed-form formula for the interpolant, and this will make it easy to do the computations in real time.