What function form might best fit a set of data involving the Pearson coefficient of a linear set of data?

21 Views Asked by At

I've recently developed an interest in statistics, so I decided to write a program which will generate a set of data, storing the Pearson correlation coefficient versus a changing random variance in the data. Here is the method I wrote to do this:

def generateData(nLines, nPoints, gradient, yIntercept, xScaleFactor, randomCoefficientScaleFactor):
    Rs = []
    xValues = []

    for i in range(nLines):
        x = []
        y = []

        for j in range(nPoints):
            x.append(j*xScaleFactor)
            y.append(fLinear(x[j], i*randomCoefficientScaleFactor, gradient, yIntercept))

        dataSet = Data(x, y)

        xValues.append(i*randomCoefficientScaleFactor)
        Rs.append(linearCorrelationCoefficient(dataSet.x, dataSet.y))

    dataSet = Data(xValues, Rs)
    return dataSet

And here is the function fLinear():

def fLinear(x, randomCoefficient, gradient, yIntercept):
    return gradient*x + yIntercept + randomCoefficient*random.uniform(0,1)

I ran this with 200 values for R, 1000 points for each line, and a scale factor on the random value of 75.

Outputting this data to a Desmos graph gives this result.

My first instinct as to which type of function would yield this shape would be a normal distribution, but as you can see here, this is not the case. My next thought would be a sigmoid, so I tried three different functions:

None of these fit the the data particularly well, so I hoped that someone here knew of any function that better fits this data set. Thanks in advance.