Average of several points after non-linear function

911 Views Asked by At

The application of this problem is in bioinformatics, but the problem itself is a general math one, I think. I have a function that converts a sequence score to a value in time units. This function involves a logrithm, so it will of course be non-linear. I also have a sequence score for several different sequences, and I want to find the average.

I don't know whether to apply the function to each individual score, and then average all of those time values, or average the sequence scores, and then apply my function to that average. Both approaches give a different number, and I was hoping someone could tell me why one is obviously wrong.

Cheers, Dave

3

There are 3 best solutions below

1
On BEST ANSWER

This is a difficult question, and not a mathematical one. When you measure something, often the number you get needs to be scaled to get the number you want. You might imagine weighing an object on a scale with a nonlinear spring. In this case you would want to apply the function to each measurement. In another case, we know that hearing response is logarithmic-that doubling the sound pressure makes a given increment in perceived volume. Whether you want to average the sound pressure or its log depends upon whether you are interested in perceived volume or how strong something has to be to withstand the pressure. You have to think clearly about what the average is for before you can know how to get it.

1
On

Usually one uses average to estimate a parameter of the probability density function related to the variable he is measuring. In particular, often, the arithmetic mean is used as the "best way" to estimate the mean of a Gaussian distribution. If $x$ and $y=f(x)$ are random variables and $f$ is not linear, at most one variable between $x$ and $y$ can be normally distributed. You should apply average to the one you guess it is.

0
On

Maybe a numerical example will clarify the issues. Suppose your sequence scores are 5, 50, 500, 5000, and 50000, and your time values, which measure how long it takes to type the sequence scores, are then 1, 2, 3, 4, and 5 (in some units). The average time it takes to write down a sequence score is 3; the average sequence score is 11111, which has a time value of 5. Is it better to report 3 as the average, or 5?