Identify outliers in a set of elements

37 Views Asked by At

I have a set of elements that has been partitioned into clusters based on several criteria, one of which is the length of the elements. To be precise, element $x$ cannot belong to cluster A if $\dfrac{\operatorname{length}(x)}{\operatorname{length}(y)} < 0.65$, where $y$ is the longest element of $A$.

As I examine the clusters, there are many cases where the elements all have similar length. But occasionally I see clusters where the longest element is clearly an outlier with respect to length: the rest of the elements are all of similar length: quite a bit shorter, close to the 65% threshold. I worry that in some cases the elements are not clustering properly because the longest element is an outlier.

I would like a way to identify such problem cases. My intuition is leading me in two directions.

  • Compute the mean length of all of the elements in the cluster except for the longest, and compare to the length of the longest.
  • Compute the proportion of each element's length in the cluster to that of the longest element.

But I'm unsure where to go from there. What is an effective measure I could use to identify clusters where the longest element is an outlier with respect to length?