How do I calculate the correct number of significant digits in the mean of a large data set?

1k Views Asked by At

I've recently become curious about calculating the mean, standard deviation, and uncertainty in the mean of relatively large data sets (>10,000 data points). My question is about the way to express the mean and the uncertainty when the uncertainty has more significant digits than does the mean.

For example, let's say I have a data set comprised of 20,000 measurements of X, and I want to calculate the mean, the standard deviation, and the uncertainty in the mean.

Without getting into the calculations themselves, suppose I generate the following statistics: the mean is 55.3 cm (calculated as 55.3456), the standard deviation is 6.2 cm (calculated as 6.1678), and the uncertainty in the mean is 0.005 cm (calculated as 0.00543).

Since the uncertainty in the mean has three significant digits, would the mean be expressed as 55.30 cm +/- 0.005 cm, or would it be 55.35 cm +/- 0.005 cm? In other words, do I use the calculated mean out to two significant digits, or do I use the mean as rounded and add a zero to pad out the significant digits?

2

There are 2 best solutions below

0
On

Quote the answer as $$ \bar{X} = 55.3 \pm 6.2 \text{ cm} $$ or $$ \bar{X} = 55.3(6.2) \text{ cm} $$ That is, two significant digits. The first digit (6) in the standard deviation is in the units place. The digit (2) is carried as a guard digit.

The standard deviation marks the number of significant digits.

See for example, Representing Significant Digits

Below are simulations of $10,000$ measurements. The first has $\sigma = 6.2$ as you reported, the second is a factor of 10 better with $\sigma = 0.62$. Hopefully, the first more accurately represents your data.

6.2

6.1

0
On

For addition you don't worry about significant digits, you worry about the least precise place. If you calculate the mean as $55.3456$ with an uncertainty of $\pm 0.00543$ you could honestly quote just that: $55.3456 \pm 0.00543$ which would claim that the actual number is between $55.34017$ and $55.35103$. It is probably more useful to round off at least the $3$ in your error. You might even say $55.346 \pm 0.006$, rounding the error up to cover the range.

Note that if you had another data set that was just the same but had $10,000.000000$ added to each value your mean would now be $10,055.3456$ with the same calculated error of the mean because that depends on the scatter of the data. You would not want to round the mean off the three significant digits, as that would give you $10,100$.