How many points necessary for standard deviation?

1.9k Views Asked by At

First off, I'm not a mathematician, and never took statistics in college, what I know about standard deviation I've learned in the past few weeks, so be gentle...

I'm working on a piece of software that calculates the standard deviation of the concentration of oxygen in a sample of water over time. I'm using a rolling numerical array of 600 doubles taken at 1 second intervals (10 minutes of data). By rolling array I mean that over time as the array fills, I increment a counter until it hits the limit of the array (600 elements), and then I reset the counter to zero to begin overwriting previous elements in the array. In this manner, as the percent oxygen settles out, the standard deviation drops over time. When the σ gets to the required level, the oxygen sensor takes a reading on the concentration of oxygen and then moves on to the next gas point.

My question is: am I taking too many points? It can take a very long time for the σ to drop to the level requested for the experiment before taking a calibration reading at that gas point. Because it takes so long, I'm actually allowing the software to move to the next point when the σ is an order of magnitude greater than requested (σ = 0.02 vs σ = 0.002).

If I reduce the number of elements in the array, I think the calculation would move much faster because there are simply fewer elements to calculate and the resultant dataset is smaller to manipulate. The software doesn't care what size the array is, it just calculates whatever size array is passed to it. Would reducing the number of elements in the array reduce the accuracy of the calculation significantly? Basically trading accuracy for speed?

For the piece of software I'm using to calculate standard deviation, you can find it here: http://www.devx.com/vb2themax/Tip/19007. It's been slightly modified from this example, but not very much:

Function ArrayStdDev(arr As Variant, Optional SampleStdDev As Boolean = True, _
Optional IgnoreEmpty As Boolean = True) As Double
Dim sum As Double
Dim sumSquare As Double
Dim value As Double
Dim count As Long
Dim Index As Long

' evaluate sum of values
' if arr isn't an array, the following statement raises an error
For Index = LBound(arr) To UBound(arr)
    value = arr(Index)
    ' skip over non-numeric values
    If value <> 0 Then
        ' add to the running total
        count = count + 1
        sum = sum + value
        sumSquare = sumSquare + value * value
     End If
Next

' evaluate the result
' use (Count-1) if evaluating the standard deviation of a sample
If count < 2 Then
ArrayStdDev = -9.99999

ElseIf SampleStdDev Then
    ArrayStdDev = Sqr((sumSquare - (sum * sum / count)) / (count - 1))
Else
    ArrayStdDev = Sqr((sumSquare - (sum * sum / count)) / count)
End If

End Function

I hope I've asked an answerable question and appreciate any insight offered.

1

There are 1 best solutions below

0
On BEST ANSWER

Based on the various suggestions in the comments above, I reduced the number of elements in the array from 600 to 200. Empirical testing showed me that anything below 200 and the calculation spanned too short of a sample and the accurate sensor would be triggered too soon (before the ml/L of oxygen measured σ had settled enough for it to be an accurate measurement). This cascaded into modifying a number of other parameters for the water bath to accommodate the changes. This wasn't unexpected, we are after all, in the experimental stages of this new type of calibration. Thanks to the suggestions provided by you guys we're now in the tweaking part of the experiment, instead of the "why isn't this working right part".

So the answer to my question seems to be "200". I was hoping for 42...