Measuring (in)accuracy of an average?

550 Views Asked by At

Preface

I've been logging the execution time of functions in my web application and I'm trying to develop a query that'll help give insight to the (in)accuracy of the average execution time.

To do this, I'm calculating the average execution time for each function (e.g. Render: Sessions 1995.90ms) then calculating the standard deviation (e.g. Render: Sessions 2676.82ms).

Question

Is it correct to say that the ratio Standard Deviation / Average is a measure of the inaccuracy of the average? Since a 0% would imply that all of the execution times are the same, right?

Something is off about that statement, I'm just not sure what. Could someone help me out understanding this further? Can't seem to find anything to explain this, just things telling me how to calculate it.

1

There are 1 best solutions below

1
On BEST ANSWER

Your statement is incorrect. The standard deviation is a measure of variability, and using it in a ratio with a measure of center is like comparing apples and oranges. The variability is a description of how spread out the data is, or how many data points are near the middle.

One of the more accurate descriptors of variability is standard deviation. For a normally distributed data set (the meaning of which is beyond the scope of this answer), approximately 68% of all observations (in your case, execution times) will lie within one standard deviation of the median.

Therefore, the "average" you are calculating should be the median (the middle number), rather than the mean (the sum of all observations divided by the number of observations).

Once you have your median and standard deviation values, there is no need to create a ratio. For example, if the median is 1900 ms and the standard deviation is 25ms, then you can assume that 68% of all execution times are between 1875ms and 1925 ms. The (in)accuracy of the median is described by the standard deviation and the interval in the above example. If the standard deviation is small, the data are rather clustered and not spread out; if the standard deviation is large, the data are more spread out.