Metric for calculating lopsided distributions

174 Views Asked by At

I have a list of ~20 numbers:

1200, 1200, 360, 360, 300, 250, 180, 180, 180, 180, 180, 90, 90, 90, 90, 45, 10, 0, 0

I am looking for a metric that determines the lopsidedness (maybe skewness) of this distribution. For the above example, I would want to be able to highlight that the sum of the first 2 numbers (2400) make up nearly 50% of the total sum (4985). But there could also be other examples where the top 4 or 5 numbers make up a big percentage (say, greater than 50%) of the total sum.

Should I just calculate skewness or are there other better metrics that fulfill my requirement?

1

There are 1 best solutions below

2
On

It's not entirely clear what precisely you want your 'lopsidedness measure' to reveal. However, I found your question interesting, and so I invented a 'lopsidedness measure', $\mathcal{L}$, that you might find useful (Before I continue, I want to make clear that even though I said I invented this, almost certainly I'm not the first to think of this. I just haven't personally heard of it before.)


How to calculate $\mathcal L$:

First, make sure that your array of numbers is sorted in an decreasing order (As yours are):

1200, 1200, 360, 360, 300, 250, 180, 180, 180, 180, 180, 90, 90, 90, 90, 45, 10, 0, 0

Next, calculate the cumulative sum of these numbers

1200, 2400, 2760, 3120, 3420, 3670, 3850, 4030, 4210, 4390, 4570, 4660, 4750, 4840, 4930, 4975, 4985, 4985, 4985

Now divide by the total sum of your original vector (which happens to be the last element of the cumulative sum, namely 4985):

0.241, 0.481, 0.554, 0.626, 0.686, 0.736, 0.772, 0.808, 0.845, 0.881, 0.917, 0.935, 0.953, 0.971, 0.989, 0.998, 1.000, 1.000, 1.000

Now take the average of these numbers. The result is $$\mathcal L = 0.810$$


Properties of $\mathcal L$:

The measure $\mathcal L$ is a real number in the interval $(0.5, 1]$, i.e. $0.5<\mathcal L \leq 1$.

When the data is at its most lopsidedness, that is when all the weight is gathered in a single entry in the sequence of numbers, then $\mathcal L = 1$.

On the other hand, the less and less lopsidedness the sequence of numbers are, i.e. the closer the weight is to being uniformly distributed, the closer the measure comes close to a half. The way to see this is the following. Create a vector of some length $n$, with all numbers equal, then you will find that $\mathcal L \approx 0.5$. If you continuously increase the numbers of elements in your sequence, if you let $n\to \infty$, then you will have that $\mathcal L \to 0.5$ from above.


Examples:

In your example, you see that your numbers are moderately 'lopsided' with $\mathcal L = 0.810$ being a bit higher than midway between $0.5$ and $1$.

Now try with a more lopsided sequence of numbers:

1000, 800, 700, 5, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 1

from which we calculate $\mathcal L = 0.946$.

Lastly, we try with a sequence of numbers that is very non-lopsided (i.e. weight close to evenly distributed):

100, 99, 98, 97, 96, 95, 94, 93, 92, 89, 88, 86, 83, 78

from which we calculate $\mathcal L = 0.555$.