Why do you take (1-the area under the ROC curve) when the area is less than .5?

31 Views Asked by At

I'm taking a course in which the ROC curve is specified by plotting points on an XY plane such that x is the false positive rate and y is the true positive rate at a certain threshold in binary classification. Then these points are joined into rectangles and the area under each is calculated, but It's said that if the area is less than .5 then you take 1-the area and we're supposed to be maximizing this area. I don't see how getting under.5 is infeasible, and if it was infeasible then why would you take it as 1-the area?

1

There are 1 best solutions below

0
On

It is a mistake to do this thoughtlessly. Getting $AUC<1/2$ means that your model is performing very poorly, as $0.5$ is the performance of a reasonable "must-beat" baseline model. However, a calibration step applied to the raw model outputs would lead to the entire two-stage model > calibration pipeline giving a ROC curve with $AUC>1/2$ (could think of it like $AUC^{\prime} = 1-AUC$, where $AUC^{\prime}$ is the ROCAUC of the calibrated model and $AUC$ is the ROCAUC of the uncalibrated model. I recently wrote about this on Data Science Stack Exchange.

I find it problematic to do this thoughtlessly because it is useful to know that such a calibration step is required. If you automatically or programatically set up your to take $1-AUC$ when $AUC<1/2$, you need to know that such a calibration step is required, lest you use the terrible raw predictions thinking they are good.