Is there any algorithm to divide the number list in multiple ranges?

251 Views Asked by At

(Moving question from stackoverflow)

SOLR do not calculate ranges on numeric facets. We have to give the ranges while querying so that SOLR gives us the count at each range. But we cannot guess the ranges at query time, as the results matching to the query can be in any range. For instance, if I search for 'Android', ranges should be calculated automatically

  • 2000 to 10000
  • 10001 to 15000
  • 15001 to 25000
  • 25001 to 40000
  • 40001 to 60000

As you can see we cannot guess these ranges.

SOLR stats gives us min, max, average, standard deviation values. With these values can we apply some algorithm to guess the ranges? Is there any formula in statistics to do so?

For instance, if the standard deviation is low then we can assume that most of the numbers are near the average.

If average is nearer to min then most of the numbers are between min and average. We can define multiple ranges between min and average.

I hope it is possible, but cannot find any generic way to define the rules.

*SOLR is an indexing application

1

There are 1 best solutions below

0
On

If you know what the distribution of the values is, then, given the appropriate statistics, you can essentially do inverse interpolation in the cumulative distribution to find the counts in given ranges.

If the data is normally distributed, the statistics you mention are enough to do what you want.