Let's say that I need a computer program to generate a function that fits a data set which has $(x,y)$ data points.
Let's say that I get the data set one point at a time, I don't know how many data points there are, and I have a finite amount of memory for storage as I build up this function.
Are there any good ways to fit a function to a data set as it comes in, with limited storage space? I think I'm looking for an "online" algorithm for fitting a function to a data set.
My best guess at an algorithm would be to decide on a fixed amount of storage to use, and have that define the degree of a polynomial. For instance, I might decide on a cubic curve and store the coefficients of the power series of $x$; $A,B,C,D$ in the equation $y = Ax^3+Bx^2+Cx+D$.
The next part of the problem would be to start the coefficients at some value, and then as data points come in, I would account for their influence in the equation some how.
Making polynomials from data points is fairly easy, using something like Lagrange Interpolation, but unfortunately I'm not sure how to turn that into an "online" algorithm where I don't know all of the data points in advance.
With fixed memory constraints, and a fixed degree polynomial (when i may get any number of points), I know that my resulting equation will only be an approximation, and won't actually necessarily be a true representation of my data set, but getting as close as reasonably possible is a desired result.
Does anyone know any techniques or methods for doing this?
Suppose you fix a degree $d$ in advance for your polynomial, and ask for a linear least-squares fit to your polynomial. With data $(x_j, y_j)$, $j = 1\ldots n$, the coefficients $c_0, \ldots, c_d$ give the least-squares solution to the system $Y = A c$ where $Y = [y_1, \ldots, y_n]^T$, and $A_{jk} = x_j^k$, $j=1\ldots, n$, $k = 0\ldots d$. This least-squares solution is given by solving the normal equations $$ A^T A c = A^T Y$$ assuming $A^T A$ has rank $d+1$, which it does as soon as there are at least $d+1$ distinct $x_j$, $c = (A^T A)^{-1} A^T Y$. Note that $(A^T A)_{ik} = \sum_j x_j^{i+k}$, so $A^T A$ is constant on antidiagonals and requires only $O(d)$ storage, and is easily updated. For each new data point $(x_j, y_j)$ you add $x_j^{i+k}$ to $(A^T A)_{ik}$, and $x_j^i y_j$ to $(A^T Y)_i$, $i=0 \ldots, d$, $k = 0 \ldots d$.