How do I compare relative importance of observations when number of observations is different in different datasets

130 Views Asked by At

Let me first describe what I mean by dataset and relative importance:

Dataset is discrete observations, where identical observations may be recorded.

Assume we have dataset A with values 6 and 4. Within this dataset A, the relative importance of observation 6 is 6/(6+4)=0.6 and relative importance of observation 4 is 4/(6+4)=0.4. Sum of relative importance within a dataset equal to 1.

Now, that we have defined the relative importance of observations within dataset A, consider dataset B with observation 5,1,1,1,1 and 1. The relative value of each of the observations of dataset B can be calculated as below:

5 -> 5/(5+1+1+1+1+1) = 0.5
1 -> 1/(5+1+1+1+1+1) = 0.1 (repeated 5 times for each 1 in dataset B)

Here too, the summation of all relative importance of observations of dataset B is equal to 0.5 + 5*0.1= 1.

How do I compare relative importance between different datasets?

My thinking is that even though the relative importance of observation 6 in dataset A = 0.6 is greater than relative importance of observation 5 in dataset B = 0.5 in absolute terms, value 5 is much more dominating in set B than value 6 is in set A. How do I redefine my definition of relative importance so that new relative importance definition results in values which are comparable across datasets having different number of observations?

Do note that observations between datasets are not comparable. Observations of A might be in range 100-200 while observations of B might be in the range 1-10.

1

There are 1 best solutions below

1
On

Let L be a finite list of numbers, s = min L, g = max L,
n the number of entries in the list and
sum L the sum of the entries in the list.

Subtract s from each entry and then divide each entry by g.
So the list of elements (L - s)/g are all in [0.1].
Now the list has been standarized.

The sum of those entries S = sum (L - s)/g = (sum L - ns)/g.
The standardized relative importance (SRI) of an entry v is
(v - s)/gS = (v - s)/(sum L - ns).
Multiply by 100 or 1000 to avoid tiny SRI's.