I would like to rank clusters according to their importance.
From 113 demand time series data I extract the following 6 features for each time series.
"Average Demand", "n95 demand", "max's deviation from n95", "Zero demand counts", "kurtosis", "skewness"
n95 is 95th percentile.
I cluster the 113 time series based on these normalized features. Lets say I have the follwing 3 clusters shown on 3 parallel coordinates plots.
I would like to rank these 3 clusters in order of importance (vaguely, "High demands" to "low demands"). Basically, High values for average demands, n95 demands, maximum's deviation from n95 and kurtosis indicates high demands (vaguely). similarly, Low values for zero counts and skewness indicates high demands (vaguely). I can do (1 - normalized zero counts) and (1 - normalized skewness) to invert its importnace so that it is comparable to the other features.
Now, I can say that high values for all these features vaguely, indicates a high demand cluster.
one way I can think of comparing the 3 clusters is by taking a representative point from each cluster then finding the magnitude (l2 or l1 norm) of each representative point and then rank them from high to low. The representative point maybe the mean value of each feature in the cluster. Is this a correct approach?
I can see that there are some standard deviations on each feature in the cluster. I thought of weighing/penalizing the feature (mean value) by the standard deviation when calculating the magnitude (l2 or l1 norm).
Lets say I only consider the first 3 features as an example.
e.g.
cluster 1 : 0.65+/-0.1 , 0.75+/-0.0 , 0.58+/-0.08
cluster 2 : 0.45+/-0.1, 0.5+/-0.01 , 0.5+/-0.0
cluster 3 : 0.32+/-0.02, 0.3+/-0.0, 0.18+/-0.0
The weights for each cluster can be calcualted as
$$weights_{feature} = \frac{(1 - std_{feature})}{ \sum^{3}_{feature}(1 - std_{feature}) } $$
I take $(1 - std_{feature})$ so that large standard deviations leads to smaller weights.
$$magnitude = || mean_{feature1} * weight_{feature1} , mean_{feature2}*weight_{feature2} , mean_{feature3}*weight_{feature3} ||_{2}$$
The magnitudes of the three clusters are :
0.3867450020294234
0.28787905862837737
0.16660816839690243
Is this the right approach or is there a better way to rank clusters? Thank you!!


