What algorithm or statistic technique can help me to recognize the most "relevant" groups based in the number of their populations?

31 Views Asked by At

Let's assume I have a number of groups, for example 0 to 999. Each of the groups have a population that range randomly from 1 to 1 million members. I want to recognize the most statistically "relevant" groups from the whole universe based on the number of their members. For example, if group 0 has 1 million members and group 144 has 10 members, you can say group 0 is more relevant than group 144, in the same context, if the group 88 has 150,000 members you could say it's also relevant however is not near to 1 million. In other example, if group 0 has 1 million member and the rest of 999 groups have each less than 10, you can say the only relevant is the group 0.

I don't want to force a specific threshold on the algorithm to say for example more than 1000 is relevant, because i don't previously know the distribution and the range of my groups number of members. Instead I want it to recognize what groups I should consider that can have a larger "sampling" caracteristics over the entire universe?

I have the feeling that a type of threshold needs to be calculated and then all groups above that threshold are considered relevant and all under are irrelevant and can be disregarded.

This technique should be the less arithmetically complex.