In Cluster Analysis, how do we calculate Purity?

634 Views Asked by At

In cluster analysis how do we calculate purity? What's the equation?

I'm not looking for a code to do it for me.

enter image description here

Let $\omega_k$ be cluster k, and $c_j$ be class j.

So is purity practically accuracy? it looks like were summing the amount of truly classified class per cluster over the sample size.

equation source

The question is what is the relationship between the output and the input?

If there's Truly Positive(TP), Truly Negative (TN), Falsely Positive(FP), Falsely Negative (FN). Is it $Purity = \frac{TP_K}{(TP+TN+FP+FN)}$?

1

There are 1 best solutions below

0
On BEST ANSWER

figured it out,

Purity is the the accuracy of the most frequent cluster, so it the number of occurrences of the most frequent classes / the size of the clusteres (this should be high)