How to calculate top percentage of a dataset?

1.7k Views Asked by At

I'm no math expert: on the contrary. I have a huge dataset -- think 100.000 records. Each contains a value between 235 and 2689. Let's say I have a score of 1860. How do I find out how I rank against other people?

A simple

100 -
  (
    (
      (1860 - 263)
      /
      (2689 - 263)
    )
    * 100
  )

says I'm in the top 34%. But this hasn't taken into account at all that 80% of the users may have 2000 points or more, putting me way down there. I don't know the name of this sort of data set or what a function could be useful.

1

There are 1 best solutions below

1
On

If you want to do it only once, you can just loop over all records. Count how many times your score is greater than the current record, then, at the end, divide by the total number of records. That will tell you the fraction of people with lower scores.

Some programming languages (like python) have even simpler implementations. Suppose the records are part of a numpy array, let's call it $data$, and your score is $score$, then your ranking is given by:

100.*len(numpy.where(score>=data)[0])/len(data)