Curve Fitting to Represent Any Data

1.5k Views Asked by At

I'm a programmer seeking to take a bunch of data and represent it as a curve. Specifically, I want to take several hundred/thousand (floating) points and represent those points to a specified level of accuracy (e.g. at least to the nearest 0.01) as a curve. Note: this data will not pass the vertical line test (it won't be representable as a function), and I ideally do not want to split it up into different curves.

(Forgive me if "curve" is the wrong word to use here. Should I refer to it as something else, e.g. polynomial? I'm not sure.)

Here is my reasoning: I would like to be able to store all of this data in as little space as possible at the cost of a little computation. I figure the only way to do that - to compress data that may not have very much redundancy (and even if it did, the compression wouldn't be as significant as I am hoping for) - is to represent the data algorithmically somehow. (I actually feel kind of smart for coming up with this idea, even if it already is in practice and I don't understand the math behind it. :)

Is this doable, and if so, how? How would the math behind this work?

If what I am wanting to achieve is absolutely impossible or would take far too much computation to be practical, that's okay. I have a contingency list of things I would sacrifice if necessary in order to have a next-best attempt:

  1. I could try sorting the data according to X positions in order to make the result a function.
  2. I could curve-fit only, say, a dozen points at a time.

but of course I would prefer not to have to do either of these.

Also, if it makes much of a difference, these points are 3 dimensional. (2 dimensional answers are welcome though, if that makes it easier to answer or simpler to explain.)

Thanks,

2

There are 2 best solutions below

1
On BEST ANSWER

I could be wrong but I'm pretty sure this is alot like how JPEG image compression works. They take in a bunch of data points (pixels) and using the principles of Fourier transforms figure out which combination of Cosine curves when added will produce the original data.

Here is a nice link

http://nautil.us/blog/the-math-trick-behind-mp3s-jpegs-and-homer-simpsons-face

The more modern compressors use "Wavelets" instead of Cosine but the basic idea still sounds really similar to your idea.

6
On

You're probably looking for Gaussian Processes. Which are totally awesome.

Gaussian Process Example Plot

To be frank, it's been many years since I did it, but here's the python code that I wrote years ago, in the hope that it will help you.

#This uses Gaussian processes algorithm from page 19 of CW2

from scipy import *
from bisect import *
from scitools.all import *  #imports numpy & scitools
import scipy.linalg as la
from scipy import interpolate

#optimal adjustment for residual is given by sum(ydata-yCS)/4n

#generate data set to practive with (will be array of x & y)
n=1000
x=linspace(0,1,n)
y=sin(10*x**2)*exp(1/(x+0.1))*(x-0.3)*(x+0.1)*(x+0.8)

random.seed(12341)
xnoise=random.normal(0,0.01,n)
ynoise=random.normal(0,0.01,n)

xdata=x+xnoise
ydata=y+ynoise

#covariance function
sigmaf=5   #sigma_f  (noise in f)
sigman=0.01       #sigma_n   (noise in x)
l=0.1           #length scale
def k(x1,x2,sigmaf,l):
    "Covariance function"
    return (sigmaf**2)*exp(-((x1-x2)**2)/(2*(l**2)))

K=zeros((len(xdata),len(xdata)))
for i,x1 in enumerate(xdata):
    for j,x2 in enumerate(xdata):
        K[i][j]=k(x1,x2,sigmaf,l)
K=matrix(K)

K2=K+(sigman**2)*eye(len(xdata))
L=la.cholesky(K2,lower=True)
y2=la.solve(L,ydata)
alpha=la.solve(L.T,y2)

def fstar(xstar,xdata,alpha,sigmaf,l):
    """calculate the expected value of f for a given x"""
    kstar=zeros(len(xdata))
    for i,x in enumerate(xdata):
        kstar[i]=k(xstar,x,sigmaf,l)
    return dot(kstar.T,alpha)


ycalc=[]
for xstar in x:
    ycalc.append(fstar(xstar,xdata,alpha,sigmaf,l))


plot(x,ycalc-y)
#plot(x,y,'-',xdata,ydata,"+",x,ycalc,title="sigma_f=%f, sigma_n=%f, and l=%f"%(sigmaf,sigman,l),legend=("actual","data","GP"),hardcopy="plot.png")

In brief, you're figuring out a matrix (alpha in the code), and using that to generate your estimate (fstar in the code is the function that does that for a given x).