Looking for an equation that reflects the majority of a collection

41 Views Asked by At

I'm working on a computer program, part of which involves sifting through large collections of numbers (by large, I mean 16 million+ numbers). I expect these collections to almost entirely be composed of small values, but with a comparitively small amount of large values.

I'm after an equation that can provide me with a figure that reflects what the "baseline" of that collection is; more or less give me the average of the low-value portion of the collection, which as I say, ought to be the majority of it.

By way of example, say I have a number collection consisting of 33 1's, 33 2's, 33 3's and 1 10,002. This all totals to 10,200. The average of these 100 values is therefore 102. But the average for the 1's, 2's and 3's - the lower end values that make up 99% of the collection is only 2! I'm looking for an equation that can process a collection of values like this and (in the case of this specific example) produce a result of 2 or close to 2; to reflect what the "baseline" of the number collection is.

1

There are 1 best solutions below

3
On

Not an explicit answer but too long for a comment.

I doubt that there's an "equation" for what you want.

One possibility is to decide in advance on a threshold percentile - say 90%. Then just find the mean of the lowest 90% of the numbers. Experiment with sample data to adjust the threshold appropriately.