Inconsistent means when values have high decimal places

98 Views Asked by Bumbble Comm At 28 Mar 2026 - 8:08

I have a large dataset where many values are in decimal (results of Euclidean distance calculation).

I was performing mean calculations on this dataset (which is on a spreadsheet) when I noticed this: If I separate the dataset to dataset1 and dataset2, I found that:

mean(dataset) != mean(mean(dataset1),mean(dataset2))

I was using the average function in Google spreadsheets. But the same happened when I transferred the data to Excel. My bet was that there are too many decimal places that Excel and Google Spreadsheets cannot handle. But the problem persisted even after using Python on the dataset and explicitly specifying that they are floats.

Even when reducing them to two decimal places: mean(dataset) = 23.76 != mean(mean(dataset1),mean(dataset2)) = 23.75

So my question specifically, is there a common practice to overcome this problem when calculating averages of numbers with many decimal places? Here are links to the datasets:

Dataset1
Dataset2
Dataset (combination of the above)

Original Q&A

There are 1 best solutions below

Bumbble Comm On 12 Aug 2016 - 7:16 BEST ANSWER

If the two datasets have different sizes, then of course you expect
mean(dataset) != mean(mean(dataset1),mean(dataset2))

For example:

mean(5,5,5,5,5) = 25/5=5
mean(0,0) = 0/2=0
mean(5,5,5,5,5,0,0) = 25/7=3.57
not equal to
mean(5,0)=2.5

See the interesting Simpson's Paradox

Inconsistent means when values have high decimal places

There are 1 best solutions below

Related Questions in AVERAGE

Related Questions in MEANS

Trending Questions

Popular # Hahtags

Popular Questions