Can Pythagoras Theorem be used to average a set of values for Comparison.

153 Views Asked by At

My basic math training tells me that you can apply Pythagoras theorem in N-Dimensional space allowing you to measure the size of a vector from The origin to a point.

If you do this for multiple data points you can compare the length of the vector, to work out which one is larger.

Is this a form of averaging? Am I missing something from a Mathematical theory standpoint or is it actually commonplace? Are there benefits of using this approach vs. Geometric Means or Arithmetic Means?

You can assume this is for Positive Numbers only.

1

There are 1 best solutions below

1
On BEST ANSWER

As a form of averaging, given $n$ values $x_1, x_2, \dots, x_n$, we can compute their quadratic mean or root mean square $$ QM = \sqrt{\frac{x_1^2 + x_2^2 + \dots + x_n^2}{n}}. $$ This is not quite your idea of comparing the lengths of the vectors, because it divides the length by $\sqrt{n}$, but it's close. In some cases, it won't matter: for example, if you are comparing one category with values $(x_1, x_2, \dots, x_5)$ to another category with values $(y_1, y_2, y_3, y_4, y_5)$, dividing both by $\sqrt5$ won't change anything. But if one category has more values than another, if you don't divide by $\sqrt n$, the comparison will primarily reflect "which category has more values"?

The quadratic mean $QM$ is always at least the arithmetic mean $AM$. In fact, we have $QM^2 = AM^2 + \sigma^2$, where $\sigma^2$ is the variance of $x_1, \dots, x_n$. (The sample variance, not the estimator for the population variance, which would require dividing by $n-1$ instead of $n$.)

So one way to interpret the quadratic mean is that when comparing two samples, it rewards the sample that's more widely dispersed from its (arithmetic) mean. In cases where the arithmetic mean should be $0$ (for example, if you're measuring the error from an unbiased prediction) the quadratic mean just reduces to the standard deviation.

It's also typical to use the quadratic mean in cases where the $x_i$'s measure diameters or side lengths or whatever, because then their squares are proportional to the areas of the objects we're looking at. For example, if you sample $n$ pizza radiuses $r_1, r_2, \dots, r_n$ and their areas $S_1, S_2, \dots, S_n$, it makes sense to use the quadratic mean of the radii $QM(r)$ together with the arithmetic mean $AM(S)$ of the areas, because these are related by $AM(S) = \pi QM(r)^2$, just as $S_i = \pi r_i^2$.