Area Under Precision-Recall and Area Under ROC curve for different amount of observations

37 Views Asked by At

I am doing a research and thus comparing some algorithms for binary classification. Worth to mention that, the data set is highly imbalanced i.e., the minority class is only 0.2%.

Notation:

Area Under Precision-Recall (AUPRC)

Area Under ROC (AUROC)

My question is, or what I want to have explained; In a scenario I have two classifiers. For the first classifier, call it A, the AUROC and AUPRC is calculated based on 142000 observations where there are 223 observations that is True Positive. For this classifier, A, I obtain AUPRC 0.10 and AUROC 0.95. However, for the second classifier, call it B, based on 1500 observations, where there are 118 True positives among these, I obtain AUPRC of 0.12 and AUROC of 0.60.

Moreover, since the AURPC is more valuable for me, classifier B, is better. However, how can the result from the AUROC of 0.95 vs 0.60 be explained? Has it something to do with the amount of observations that the two classifiers are tested on?

I would appreciate if someone could make an explanation of this.

Thank you.