How to create an accurate baseline mean when outliers exist in a large dataset

Question

How to create an accurate baseline mean when outliers exist in a large dataset

150 Views Asked by user591045 At 10 May 2026 - 6:26

I have a relatively large database of 40,000+ entries, all with several data points per entry-line. The Data is collected on an ongoing basis. I would like to establish a baseline mean and standard deviation for several data-sets that exist within the database so that I can identify outliers as a part of a QA/QC process.

However, as far as I can tell, many of the recommendations for identifying outliers seem to assume that you can already establish a mean and standard deviation that is reasonable to begin with. However, this seems at first glance to be a bit circular when dealing with an empirical data-set. Won't the mean and standard deviation you produce be tainted by the outliers? Hence, won't some of the outliers fail to be identified by a method like evaluating n number of standard deviations away from the mean?

There are data values which I have identified as clear outliers and corrected or thrown out, but I'm concerned that I'm not including large quantities of outliers which exist on the edge of what seems reasonable.

I don't have a background in statistics or data management/analysis, but it's fallen on me to handle this database, so I would greatly appreciate any insights or responses on this matter. It may simply be that I'm missing some essential conceptual piece here.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The definition of an outlier is $1.5$ times the interquartile range outside of the 1st and 3rd quartiles. Use all the data points to determine the quartiles and eliminate the outliers. Determine the mean and standard deviation from the remaining data. Large quantities of outliers is a bit of a contradiction and may just mean the standard deviation is large.

How to create an accurate baseline mean when outliers exist in a large dataset

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in DATA-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions