Statistical Metric for Dominance of the Largest Subset in a Partition of a Set

22 Views Asked by At

Given a set and a partition of it, I want to calculate a score (between 0 and 1) reflecting how much is the set "dominated" by the largest subset of the partition. The intuitive idea I'm trying to capture is, considering a set of answers to a question, what's the probability that the most common answer is correct. It might take into account the ratio between the size of the largest subset and the size of the entire set, the ratio between the largest subset to other subsets, etc.

Examples: (The numbers refer to the size of each subset in the partition)

  • The score of 60,40 should be lower than 80,20 because the largest subset is smaller in comparison to the entire set.
  • The score of 70,30 should be lower than 70,5,5,5,5,5,5 because there's a second subset that challenges the dominance of the largest subset.
  • The score of 90,5,5 should be lower than 90,5,1,1,1,1,1 because an answer given by 5 persons challenges the dominance of the most common answer more than 5 different answers given by 5 different persons.

The closest metric I've found in search is the Gini Coefficient, but it doesn't refer specifically to the (one) largest subset, but to the general inequality between the subsets.

Is there a known metric that captures the idea I've described?